This is the most data rich option. This will bring down keywords, abstracts, citation information, country of origin etc
This will bring down paper titles, keywords, author afiliations and some other metadata but not citation information or the country information. You can get citations per year from online if its less than 10000 (see below).
You can get broad scale data from online without doing the 500 at a time-but not keywords and other metadata.
First up set your R up by loading the required packages and setting the working directory. Change the path in setwd to where you have saved your files.
If you failed to follow Eddy’s instructions and did not install the packages already you will need to install them also.
#unhash this if you can't follow instructions
#install.packages('bibliometrix')
#install.packages('tidyverse')
#install.packages("wordcloud")
#install.packages('tm')
#install.packages('RColorBrewer')
#load packages
library(bibliometrix)
library(tidyverse)
library(wordcloud)
library(tm)
library(RColorBrewer)
setwd("~/Documents/CoAuthorMS/parasitebibsearch/parasitesonly/")
We are going to use the bibliometrix package to load our many files from the downloads in but first we are just going to modify a function from that package to streamline the read in of files (dont worry about what is going on here)
readFilesmod<-function (...)
{
arguments <- as.list(...)
k = length(arguments)
D = list()
enc = "UTF-8"
for (i in 1:k) {
D[[i]] = suppressWarnings(readLines(arguments[[i]], encoding = enc))
}
D = unlist(D)
return(D)
}
Now we are going to read our data in using the function from above. This will generate a huge character file which is horible to look at. We will use a function from the bibliometrix package to turn this into a table.
file_list<-list.files(pattern='*.bib',full.names=T)
citations<-readFilesmod(dput(as.character(file_list)))
## c("./savedrecs.bib", "./savedrecs(1).bib", "./savedrecs(2).bib",
## "./savedrecs(3).bib")
citations_df <- convert2df(citations, dbsource = "isi", format = "bibtex")
##
## Converting your isi collection into a bibliographic dataframe
##
## Articles extracted 100
## Articles extracted 200
## Articles extracted 300
## Articles extracted 400
## Articles extracted 500
## Articles extracted 600
## Articles extracted 700
## Articles extracted 800
## Articles extracted 900
## Articles extracted 1000
## Articles extracted 1100
## Articles extracted 1200
## Articles extracted 1300
## Articles extracted 1400
## Articles extracted 1500
## Articles extracted 1507
## Done!
##
##
## Generating affiliation field tag AU_UN from C1: Done!
Now we have a dataframe-much easier! You can view this to see the various feilds with the authors etc.
We can also subset this dataset here: If we only wanted to look at records from 2000-2005 (year is under PY in the dataframe-publication year).
citations_df_2000_2005<-citations_df %>% filter(between(PY, 2000, 2005))
We are going to use the function biblioanalysis to turn our dataframe into a object of various tables. None of these statistics are particularily hard to generate or special-it just does it in one hit which is nice! This function builds an object with various dataframes stored in it (23 dataframes). To access a dataframe you can call it directly. Remember you can view the help file for a function at anytime in Rstudio e.g. ?biblioAnalysis
This transformation drops some information so we will go back to the orginal table every now and then-depending on what we want to do.
citations_ana <- biblioAnalysis(citations_df, sep = ";")
#you can call them directly with:
head(citations_ana$Years)
## [1] 2018 2018 2018 2018 2018 2018
#Most of them are just straight forward transformation, e.g.
head(citations_ana$TotalCitation)
## [1] 0 0 0 1 0 0
#is just this column from the orginal dataframe
head(citations_df$TC)
## [1] 0 0 0 1 0 0
The handy thing about turning it into a bibliometrix object is that use can use the summary function on the bibliometrix object. You can set the number of entries to return by changing the k = X
citations_ana.sum <- summary(object = citations_ana, k = 100, pause = FALSE)
##
##
## Main Information about data
##
## Documents 1507
## Sources (Journals, Books, etc.) 429
## Keywords Plus (ID) 3835
## Author's Keywords (DE) 3011
## Period 1966 - 2018
## Average citations per documents 30.1
##
## Authors 4639
## Author Appearances 7058
## Authors of single authored documents 69
## Authors of multi authored documents 4570
##
## Documents per Author 0.325
## Authors per Document 3.08
## Co-Authors per Documents 4.68
## Collaboration Index 3.26
##
##
## Annual Scientific Production
##
## Year Articles
## 1966 1
## 1968 1
## 1973 1
## 1974 1
## 1975 3
## 1977 2
## 1978 4
## 1979 3
## 1980 6
## 1981 3
## 1982 9
## 1983 7
## 1984 11
## 1985 5
## 1986 7
## 1987 15
## 1988 5
## 1989 8
## 1990 17
## 1991 67
## 1992 84
## 1993 70
## 1994 67
## 1995 66
## 1996 65
## 1997 72
## 1998 95
## 1999 82
## 2000 67
## 2001 51
## 2002 54
## 2003 57
## 2004 60
## 2005 41
## 2006 57
## 2007 42
## 2008 44
## 2009 40
## 2010 27
## 2011 27
## 2012 19
## 2013 22
## 2014 23
## 2015 28
## 2016 29
## 2017 23
## 2018 19
##
## Annual Percentage Growth Rate 6.610257
##
##
## Most Productive Authors
##
## Authors Articles Authors Articles Fractionalized
## 1 TIBAYRENC M 42 TIBAYRENC M 12.05
## 2 PRATLONG F 41 SUPURAN CT 10.32
## 3 SUPURAN CT 33 ANDREWS RH 5.93
## 4 DEDET JP 29 PRATLONG F 5.60
## 5 MATTIUCCI S 25 CLARK CG 5.33
## 6 ANDREWS RH 23 EVANS DA 4.48
## 7 NASCETTI G 23 DEDET JP 4.46
## 8 MILES MA 20 MATTIUCCI S 4.32
## 9 BARNABE C 19 PANIAGUA E 4.25
## 10 ROMANHA AJ 19 VILAS R 4.00
## 11 GRAMICCIA M 18 MILES MA 3.98
## 12 DUJARDIN JP 17 NASCETTI G 3.93
## 13 EVANS DA 17 CHILTON NB 3.69
## 14 BRENIERE SF 15 GOKA K 3.67
## 15 GATTI S 15 ROMANHA AJ 3.66
## 16 SCAGLIA M 15 GRAMICCIA M 3.62
## 17 CHIARI E 14 BARNABE C 3.60
## 18 PANIAGUA E 14 PETRI WA 3.48
## 19 CHILTON NB 13 AYALA FJ 3.43
## 20 VILAS R 13 BEVERIDGE I 3.28
## 21 POZIO E 12 BLAIR D 3.17
## 22 TAKEUCHI T 12 EBERT F 3.00
## 23 ALOTHMAN Z 11 FOLEY DH 3.00
## 24 CUPOLILLO E 11 TAKAFUJI A 3.00
## 25 GRIMALDI G 11 DUJARDIN JP 2.91
## 26 KOBAYASHI S 11 MONIS PT 2.87
## 27 KREUTZER RD 11 NAVAJAS M 2.87
## 28 SNABEL V 11 BRENIERE SF 2.76
## 29 BOSSENO MF 10 VERDYCK P 2.70
## 30 CAPASSO C 10 MIRELMAN D 2.62
## 31 MAYRHOFER G 10 SANMARTIN ML 2.58
## 32 RIOUX JA 10 CHIARI E 2.46
## 33 TAIT A 10 POZIO E 2.44
## 34 BEVERIDGE I 9 GATTI S 2.42
## 35 BRUNO A 9 SCAGLIA M 2.42
## 36 CLARK CG 9 SCARPASSA VM 2.42
## 37 DEL PRETE S 9 STEVENS JR 2.35
## 38 DUJARDIN JC 9 KREUTZER RD 2.35
## 39 GRADONI L 9 NADLER SA 2.33
## 40 HAQUE R 9 HAQUE R 2.30
## 41 MICHELS PAM 9 SNABEL V 2.28
## 42 OSMAN SM 9 JACKSON TFHG 2.28
## 43 SCOZZAFAVA A 9 LYMBERY AJ 2.25
## 44 SOLARI A 9 MULVEY M 2.25
## 45 TACHIBANA H 9 MAYRHOFER G 2.22
## 46 VARGAS F 9 CABARET J 2.19
## 47 VULLO D 9 MOMEN H 2.18
## 48 YANAGI T 9 GIBSON W 2.17
## 49 AGATSUMA T 8 GRADONI L 2.15
## 50 ALVAR J 8 TAKEUCHI T 2.09
## 51 AYALA FJ 8 AGATSUMA T 2.03
## 52 CABARET J 8 DIAMOND LS 2.03
## 53 CEVINI C 8 EBERT D 2.03
## 54 CIPRIANI P 8 MCMANUS DP 2.03
## 55 FOLEY DH 8 OROZCO E 2.03
## 56 GOKA K 8 KOBAYASHI S 2.02
## 57 HASHIGUCHI Y 8 SCOZZAFAVA A 2.02
## 58 HATAM GR 8 DUFFY JE 2.00
## 59 MOMEN H 8 RANNALA BH 2.00
## 60 MONIS PT 8 TAIT A 1.94
## 61 NAVAJAS M 8 GODFREY DG 1.92
## 62 PAOLETTI M 8 BRYAN JH 1.92
## 63 RENAUD F 8 ARRIVILLAGA J 1.89
## 64 SANMARTIN ML 8 OPPERDOES FR 1.87
## 65 ANDRADE SG 7 SARGEAUNT PG 1.87
## 66 BANULS AL 7 DESSER SS 1.83
## 67 DARDE ML 7 LITTLE TJ 1.83
## 68 DEREURE J 7 SOLARI A 1.80
## 69 EY PL 7 MONTEIRO FA 1.78
## 70 FERNANDES O 7 YAMAZAKI Y 1.75
## 71 FIGUEIREDO FB 7 BOSSENO MF 1.72
## 72 GUHL F 7 ANDRADE SG 1.72
## 73 LANOTTE G 7 VRIJENHOEK RC 1.70
## 74 OPPERDOES FR 7 WILLIAMS JE 1.70
## 75 PASTEUR N 7 GRIMALDI G 1.70
## 76 PESSON B 7 STOUTHAMER R 1.67
## 77 PETRI WA 7 CUPOLILLO E 1.66
## 78 RAVEL C 7 HATAM GR 1.64
## 79 STEVENS JR 7 DARDE ML 1.62
## 80 TAKAFUJI A 7 PASTEUR N 1.62
## 81 AREVALO J 6 PHILLIPS CB 1.59
## 82 BLAIR D 6 EDWARDS DD 1.53
## 83 CARTA F 6 D'AMELIO S 1.51
## 84 GODFREY DG 6 BANDONI SM 1.50
## 85 GOMEZ EA 6 BURDON JJ 1.50
## 86 JACKSON TFHG 6 CLAY K 1.50
## 87 MADEIRA MF 6 CROFT BA 1.50
## 88 MIMORI T 6 DE SOUSA MA 1.50
## 89 MIRELMAN D 6 THOMPSON RCA 1.50
## 90 MONTEIRO FA 6 EY PL 1.49
## 91 MOTAZEDIAN MH 6 GIBSON WC 1.48
## 92 MULVEY M 6 MICHELS PAM 1.48
## 93 MURTA SMF 6 TOMAVO S 1.48
## 94 READY PD 6 TACHIBANA H 1.45
## 95 RODRIGUEZ-PAEZ L 6 ILINE II 1.45
## 96 SARGEAUNT PG 6 LUN ZR 1.45
## 97 SCHOFIELD CJ 6 TRUC P 1.44
## 98 SCHONIAN G 6 PESSON B 1.44
## 99 SITHITHAWORN P 6 BLANCO A 1.43
## 100 STEINDEL M 6 MATHIEUDAUDE F 1.43
##
##
## Top manuscripts per citations
##
## Paper TC TCperYear
## 1 FORSTERMANN U, 1994, HYPERTENSION 815 33.96
## 2 LINHART YB, 1996, ANNU REV ECOL SYST 798 36.27
## 3 ZINGALES B, 2009, MEM INST OSWALDO CRUZ 505 56.11
## 4 RIOUX JA, 1990, ANNALES DE PARASITOLOGIE HUMAINE ET COMPAREE 425 15.18
## 5 SUPURAN CT, 2010, BIOORG MED CHEM LETT 414 51.75
## 6 DIAMOND LS, 1993, J EUKARYOT MICROBIOL 381 15.24
## 7 SUPURAN CT, 2007, BIOORG MED CHEM 358 32.55
## 8 ARNAUD-HAOND S, 2007, MOL ECOL 336 30.55
## 9 TIBAYRENC M, 1991, PROC NATL ACAD SCI U S A 325 12.04
## 10 SARGEAUNT PG, 1978, TRANS ROY SOC TROP MED HYG-a 298 7.45
## 11 SUPURAN CT, 2008, CURR PHARM DESIGN 276 27.60
## 12 MAGILL AJ, 1993, N ENGL J MED 262 10.48
## 13 TIBAYRENC M, 1988, EVOLUTION 261 8.70
## 14 MILES MA, 1977, TRANS ROY SOC TROP MED HYG 256 6.24
## 15 COLLINS FH, 1996, INSECT MOL BIOL 254 11.55
## 16 MACHADO CA, 2001, PROC NATL ACAD SCI U S A 247 14.53
## 17 HAMPL V, 2001, INT J SYST EVOL MICROBIOL 237 13.94
## 18 BARRY M, 1997, CLIN PHARMACOKINET 236 11.24
## 19 MONIS PT, 1999, MOL BIOL EVOL 197 10.37
## 20 MATTIUCCI S, 1997, J PARASITOL 197 9.38
## 21 DARDE ML, 1992, J PARASITOL 197 7.58
## 22 MACEDO AM, 2004, MEM INST OSWALDO CRUZ 194 13.86
## 23 MONIS PT, 2003, INFECT GENET EVOL 189 12.60
## 24 MOLLER AP, 1998, BEHAV ECOL SOCIOBIOL 188 9.40
## 25 BLACK WC, 1992, BULL ENTOMOL RES 187 7.19
## 26 PATEL MS, 2006, BIOCHEM SOC TRANS 181 15.08
## 27 KREUTZER RD, 1980, AM J TROP MED HYG 176 4.63
## 28 CLARK CG, 1991, MOL BIOCHEM PARASITOL 171 6.33
## 29 HAQUE R, 1998, J CLIN MICROBIOL 166 8.30
## 30 SOLTIS DE, 1991, AM J BOT-a 166 6.15
## 31 DYBDAHL MF, 1996, EVOLUTION 165 7.50
## 32 ZARLENGA DS, 1999, INT J PARASIT 162 8.53
## 33 GOODWIN SB, 1995, PLANT DIS 162 7.04
## 34 CUPOLILLO E, 1995, MOL BIOCHEM PARASITOL 162 7.04
## 35 BLUM J, 2004, J ANTIMICROB CHEMOTHER 158 11.29
## 36 STEVENS JR, 1999, PARASITOLOGY 158 8.32
## 37 LIM K, 1994, PROTEIN SCI 158 6.58
## 38 MUNDERLOH UG, 1994, J PARASITOL 157 6.54
## 39 HOMAN WL, 2001, INT J PARASIT 156 9.18
## 40 LOXDALE HD, 1998, BULL ENTOMOL RES 154 7.70
## 41 BOWLES J, 1993, ACTA TROP 153 6.12
## 42 SCHWARZ D, 2005, NATURE 151 11.62
## 43 TIBAYRENC M, 1998, INT J PARASIT 150 7.50
## 44 BURDON JJ, 1993, ANNU REV PHYTOPATHOL 140 5.60
## 45 LEHMANN T, 1996, HEREDITY 138 6.27
## 46 EBERT D, 1998, PROC R SOC B-BIOL SCI 137 6.85
## 47 LEHMANN T, 1998, MOL BIOL EVOL 134 6.70
## 48 BARNABE C, 2000, PARASITOLOGY 133 7.39
## 49 SCOZZAFAVA A, 2006, EXPERT OPIN THER PATENTS 130 10.83
## 50 MATTIUCCI S, 2006, PARASITE-J SOC FR PARASITOL 130 10.83
## 51 LESSA EP, 1998, MOL PHYLOGENET EVOL 126 6.30
## 52 THOMAS Y, 2003, EVOLUTION 124 8.27
## 53 ZINGALES B, 1998, INT J PARASIT 123 6.15
## 54 MURTA SMF, 1998, MOL BIOCHEM PARASITOL 122 6.10
## 55 REVOLLO S, 1998, EXP PARASITOL 120 6.00
## 56 HAQUE R, 1995, J CLIN MICROBIOL 120 5.22
## 57 ARNAUD-HAOND S, 2005, J HERED 115 8.85
## 58 CAMERON P, 2004, J IMMUNOL 115 8.21
## 59 BAYMAN P, 1991, CAN J BOT -REV CAN BOT 112 4.15
## 60 SMITH MA, 2008, MOL ECOL RESOUR 111 11.10
## 61 NEFF BD, 2001, EVOLUTION 111 6.53
## 62 ZIJLSTRA C, 1995, PHYTOPATHOLOGY 111 4.83
## 63 KREUTZER RD, 1983, AM J TROP MED HYG 111 3.17
## 64 SUPURAN CT, 2007, CURR TOP MED CHEM 110 10.00
## 65 TANNICH E, 1991, J CLIN MICROBIOL 110 4.07
## 66 JACOBSON RL, 2003, J INFECT DIS 109 7.27
## 67 ROSENTHAL E, 1995, TRANS ROY SOC TROP MED HYG 109 4.74
## 68 SAMUELSON J, 1991, J EXP MED 109 4.04
## 69 MAURICIO IL, 2006, INT J PARASIT 108 9.00
## 70 EY PL, 1997, J EUKARYOT MICROBIOL 108 5.14
## 71 NEVO E, 1998, GENET RESOUR CROP EVOL 107 5.35
## 72 BESANSKY NJ, 1997, GENETICS 107 5.10
## 73 ANDERSON TJC, 1993, PARASITOLOGY 107 4.28
## 74 MIRELMAN D, 1986, INFECT IMMUN 104 3.25
## 75 SARGEAUNT PG, 1978, TRANS ROY SOC TROP MED HYG 104 2.60
## 76 ANTINORI S, 2007, CLIN INFECT DIS 103 9.36
## 77 JARNE P, 1993, BIOL J LINNEAN SOC 103 4.12
## 78 ACUNASOTO R, 1993, AM J TROP MED HYG 103 4.12
## 79 VALENTINI A, 2006, J PARASITOL 102 8.50
## 80 NAVAJAS M, 2000, EXP APPL ACAROL-a 102 5.67
## 81 CARRASCO HJ, 1996, AM J TROP MED HYG 101 4.59
## 82 NADLER SA, 2011, PARASITOLOGY 99 14.14
## 83 SCHWENKENBECHER JM, 2006, INT J PARASIT 98 8.17
## 84 SUPURAN CT, 2010, CURR PHARM DESIGN 97 12.12
## 85 WURGLER FE, 1992, MUTAGENESIS 96 3.69
## 86 MACLEOD A, 2000, PROC NATL ACAD SCI U S A 95 5.28
## 87 MARTIN FN, 2000, MYCOLOGIA 95 5.28
## 88 BENNETT JW, 2013, N ENGL J MED 94 18.80
## 89 TOLEDO MJD, 2003, ANTIMICROB AGENTS CHEMOTHER 94 6.27
## 90 LEE JY, 1998, BIOCHEMISTRY 93 4.65
## 91 GASKIN AA, 2002, J VET INTERN MED 91 5.69
## 92 OUDEMANS P, 1991, MYCOL RES 91 3.37
## 93 MIRELMAN D, 1986, EXP PARASITOL 91 2.84
## 94 KUHLS K, 2005, MICROBES INFECT 90 6.92
## 95 PRATLONG F, 2004, J CLIN MICROBIOL 89 6.36
## 96 JERONIMO SMB, 1994, TRANS ROY SOC TROP MED HYG 89 3.71
## 97 MATTIUCCI S, 2002, SYST PARASITOL 88 5.50
## 98 GRACE JM, 1998, DRUG METAB DISPOS 88 4.40
## 99 MONIS PT, 1998, PARASITOLOGY 88 4.40
## 100 ANDERSON TJC, 1997, PARASITOLOGY 88 4.19
##
##
## Most Productive Countries (of corresponding authors)
##
## Country Articles Freq SCP MCP MCP_Ratio
## 1 USA 189 0.130525 134 55 0.2910
## 2 BRAZIL 137 0.094613 100 37 0.2701
## 3 UNITED KINGDOM 134 0.092541 76 58 0.4328
## 4 FRANCE 110 0.075967 69 41 0.3727
## 5 ITALY 100 0.069061 56 44 0.4400
## 6 JAPAN 67 0.046271 43 24 0.3582
## 7 AUSTRALIA 66 0.045580 53 13 0.1970
## 8 SPAIN 60 0.041436 34 26 0.4333
## 9 GERMANY 39 0.026934 24 15 0.3846
## 10 MEXICO 31 0.021409 24 7 0.2258
## 11 CANADA 30 0.020718 23 7 0.2333
## 12 INDIA 27 0.018646 22 5 0.1852
## 13 BELGIUM 26 0.017956 11 15 0.5769
## 14 SWITZERLAND 24 0.016575 14 10 0.4167
## 15 ARGENTINA 22 0.015193 16 6 0.2727
## 16 CHINA 19 0.013122 16 3 0.1579
## 17 COLOMBIA 19 0.013122 14 5 0.2632
## 18 KENYA 18 0.012431 11 7 0.3889
## 19 VENEZUELA 18 0.012431 11 7 0.3889
## 20 IRAN 17 0.011740 13 4 0.2353
## 21 THAILAND 17 0.011740 4 13 0.7647
## 22 BOLIVIA 14 0.009669 3 11 0.7857
## 23 EGYPT 14 0.009669 11 3 0.2143
## 24 NEW ZEALAND 13 0.008978 12 1 0.0769
## 25 ISRAEL 12 0.008287 6 6 0.5000
## 26 NETHERLANDS 12 0.008287 3 9 0.7500
## 27 CZECH REPUBLIC 11 0.007597 6 5 0.4545
## 28 AUSTRIA 10 0.006906 8 2 0.2000
## 29 CHILE 10 0.006906 5 5 0.5000
## 30 SLOVAKIA 10 0.006906 0 10 1.0000
## 31 TURKEY 10 0.006906 8 2 0.2000
## 32 PORTUGAL 8 0.005525 1 7 0.8750
## 33 SOUTH AFRICA 8 0.005525 6 2 0.2500
## 34 TUNISIA 8 0.005525 1 7 0.8750
## 35 POLAND 7 0.004834 5 2 0.2857
## 36 SWEDEN 7 0.004834 5 2 0.2857
## 37 CAMEROON 6 0.004144 2 4 0.6667
## 38 DENMARK 6 0.004144 3 3 0.5000
## 39 ETHIOPIA 6 0.004144 1 5 0.8333
## 40 MOROCCO 6 0.004144 3 3 0.5000
## 41 FINLAND 5 0.003453 2 3 0.6000
## 42 IRAQ 5 0.003453 2 3 0.6000
## 43 SUDAN 5 0.003453 0 5 1.0000
## 44 GEORGIA 4 0.002762 2 2 0.5000
## 45 HUNGARY 4 0.002762 4 0 0.0000
## 46 KOREA 4 0.002762 3 1 0.2500
## 47 MALAYSIA 4 0.002762 3 1 0.2500
## 48 PANAMA 4 0.002762 3 1 0.2500
## 49 RUSSIA 4 0.002762 3 1 0.2500
## 50 TAIWAN 4 0.002762 0 4 1.0000
## 51 UGANDA 4 0.002762 2 2 0.5000
## 52 ALGERIA 3 0.002072 1 2 0.6667
## 53 PARAGUAY 3 0.002072 1 2 0.6667
## 54 SERBIA 3 0.002072 2 1 0.3333
## 55 URUGUAY 3 0.002072 2 1 0.3333
## 56 ZIMBABWE 3 0.002072 0 3 1.0000
## 57 BANGLADESH 2 0.001381 1 1 0.5000
## 58 BULGARIA 2 0.001381 2 0 0.0000
## 59 COSTA RICA 2 0.001381 2 0 0.0000
## 60 CROATIA 2 0.001381 0 2 1.0000
## 61 ECUADOR 2 0.001381 1 1 0.5000
## 62 GREECE 2 0.001381 0 2 1.0000
## 63 IRELAND 2 0.001381 1 1 0.5000
## 64 LEBANON 2 0.001381 2 0 0.0000
## 65 MALTA 2 0.001381 1 1 0.5000
## 66 PERU 2 0.001381 1 1 0.5000
## 67 ROMANIA 2 0.001381 2 0 0.0000
## 68 SAUDI ARABIA 2 0.001381 0 2 1.0000
## 69 SRI LANKA 2 0.001381 0 2 1.0000
## 70 YEMEN 2 0.001381 0 2 1.0000
## 71 BAHAMAS 1 0.000691 0 1 1.0000
## 72 BAHRAIN 1 0.000691 1 0 0.0000
## 73 BURKINA FASO 1 0.000691 0 1 1.0000
## 74 ESTONIA 1 0.000691 1 0 0.0000
## 75 GUATEMALA 1 0.000691 1 0 0.0000
## 76 MAURITANIA 1 0.000691 1 0 0.0000
## 77 NIGERIA 1 0.000691 1 0 0.0000
## 78 PAKISTAN 1 0.000691 0 1 1.0000
## 79 SLOVENIA 1 0.000691 1 0 0.0000
## 80 ZAMBIA 1 0.000691 0 1 1.0000
##
##
## SCP: Single Country Publications
##
## MCP: Multiple Country Publications
##
##
## Total Citations per Country
##
## Country Total Citations Average Article Citations
## 1 USA 8362 44.24
## 2 UNITED KINGDOM 5177 38.63
## 3 FRANCE 4789 43.54
## 4 ITALY 4070 40.70
## 5 BRAZIL 3603 26.30
## 6 AUSTRALIA 2204 33.39
## 7 GERMANY 1863 47.77
## 8 JAPAN 1224 18.27
## 9 SPAIN 927 15.45
## 10 CANADA 868 28.93
## 11 SWITZERLAND 844 35.17
## 12 ISRAEL 579 48.25
## 13 BELGIUM 563 21.65
## 14 PORTUGAL 532 66.50
## 15 NETHERLANDS 483 40.25
## 16 BOLIVIA 471 33.64
## 17 CZECH REPUBLIC 463 42.09
## 18 MEXICO 438 14.13
## 19 KENYA 359 19.94
## 20 ARGENTINA 343 15.59
## 21 THAILAND 341 20.06
## 22 COLOMBIA 290 15.26
## 23 SWEDEN 278 39.71
## 24 VENEZUELA 262 14.56
## 25 INDIA 259 9.59
## 26 CHILE 213 21.30
## 27 PANAMA 206 51.50
## 28 NEW ZEALAND 204 15.69
## 29 IRAN 200 11.76
## 30 CHINA 182 9.58
## 31 DENMARK 156 26.00
## 32 URUGUAY 155 51.67
## 33 SOUTH AFRICA 152 19.00
## 34 AUSTRIA 147 14.70
## 35 SLOVAKIA 129 12.90
## 36 TURKEY 124 12.40
## 37 ETHIOPIA 121 20.17
## 38 TUNISIA 105 13.12
## 39 FINLAND 101 20.20
## 40 SRI LANKA 92 46.00
## 41 IRAQ 90 18.00
## 42 ECUADOR 82 41.00
## 43 BANGLADESH 77 38.50
## 44 MALAYSIA 76 19.00
## 45 EGYPT 73 5.21
## 46 SUDAN 72 14.40
## 47 MOROCCO 71 11.83
## 48 UGANDA 65 16.25
## 49 GEORGIA 59 14.75
## 50 POLAND 58 8.29
## 51 ALGERIA 52 17.33
## 52 KOREA 52 13.00
## 53 CAMEROON 48 8.00
## 54 PERU 48 24.00
## 55 ZIMBABWE 47 15.67
## 56 GREECE 42 21.00
## 57 TAIWAN 41 10.25
## 58 BURKINA FASO 36 36.00
## 59 MALTA 34 17.00
## 60 IRELAND 31 15.50
## 61 HUNGARY 23 5.75
## 62 LEBANON 22 11.00
## 63 SAUDI ARABIA 17 8.50
## 64 CROATIA 14 7.00
## 65 SERBIA 13 4.33
## 66 PAKISTAN 12 12.00
## 67 PARAGUAY 12 4.00
## 68 ROMANIA 12 6.00
## 69 ZAMBIA 12 12.00
## 70 BAHRAIN 9 9.00
## 71 COSTA RICA 9 4.50
## 72 GUATEMALA 9 9.00
## 73 YEMEN 9 4.50
## 74 BAHAMAS 8 8.00
## 75 ESTONIA 5 5.00
## 76 RUSSIA 4 1.00
## 77 SLOVENIA 2 2.00
## 78 BULGARIA 1 0.50
## 79 MAURITANIA 1 1.00
## 80 NIGERIA 1 1.00
##
##
## Most Relevant Sources
##
## Sources Articles
## 1 TRANSACTIONS OF THE ROYAL SOCIETY OF TROPICAL MEDICINE AND HYGIENE 72
## 2 AMERICAN JOURNAL OF TROPICAL MEDICINE AND HYGIENE 70
## 3 PARASITOLOGY 62
## 4 INTERNATIONAL JOURNAL FOR PARASITOLOGY 55
## 5 MEMORIAS DO INSTITUTO OSWALDO CRUZ 49
## 6 JOURNAL OF PARASITOLOGY 47
## 7 PARASITOLOGY RESEARCH 46
## 8 ACTA TROPICA 42
## 9 MOLECULAR AND BIOCHEMICAL PARASITOLOGY 37
## 10 EXPERIMENTAL PARASITOLOGY 34
## 11 JOURNAL OF MEDICAL ENTOMOLOGY 24
## 12 ANNALS OF TROPICAL MEDICINE AND PARASITOLOGY 22
## 13 INFECTION GENETICS AND EVOLUTION 16
## 14 JOURNAL OF CLINICAL MICROBIOLOGY 16
## 15 VETERINARY PARASITOLOGY 16
## 16 MEDICAL AND VETERINARY ENTOMOLOGY 15
## 17 SYSTEMATIC PARASITOLOGY 15
## 18 EVOLUTION 14
## 19 PARASITE-JOURNAL DE LA SOCIETE FRANCAISE DE PARASITOLOGIE 14
## 20 HEREDITY 13
## 21 BIOORGANIC \\& MEDICINAL CHEMISTRY 12
## 22 JOURNAL OF EUKARYOTIC MICROBIOLOGY 11
## 23 MOLECULAR ECOLOGY 10
## 24 ANNALS OF THE ENTOMOLOGICAL SOCIETY OF AMERICA 9
## 25 APPLIED ENTOMOLOGY AND ZOOLOGY 9
## 26 BIOLOGICAL JOURNAL OF THE LINNEAN SOCIETY 9
## 27 EXPERIMENTAL AND APPLIED ACAROLOGY 9
## 28 JOURNAL OF ENZYME INHIBITION AND MEDICINAL CHEMISTRY 9
## 29 JOURNAL OF PROTOZOOLOGY 9
## 30 PLOS ONE 9
## 31 TROPICAL MEDICINE \\& INTERNATIONAL HEALTH 9
## 32 BULLETIN OF ENTOMOLOGICAL RESEARCH 8
## 33 COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY B-BIOCHEMISTRY \\& MOLECULAR BIOLOGY 8
## 34 EXPERIMENTAL \\& APPLIED ACAROLOGY 8
## 35 HELMINTHOLOGIA 8
## 36 PHYTOPATHOLOGY 8
## 37 PLANT DISEASE 8
## 38 REVISTA DA SOCIEDADE BRASILEIRA DE MEDICINA TROPICAL 8
## 39 BIOORGANIC \\& MEDICINAL CHEMISTRY LETTERS 7
## 40 CLINICAL INFECTIOUS DISEASES 7
## 41 JOURNAL OF INFECTIOUS DISEASES 7
## 42 JOURNAL OF NEMATOLOGY 7
## 43 MALACOLOGIA 7
## 44 JOURNAL OF HELMINTHOLOGY 6
## 45 NEMATOLOGY 6
## 46 PARASITES \\& VECTORS 6
## 47 PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA 6
## 48 AMERICAN JOURNAL OF BOTANY 5
## 49 ANNALES DE PARASITOLOGIE HUMAINE ET COMPAREE 5
## 50 BIOCHEMICAL SYSTEMATICS AND ECOLOGY 5
## 51 EUROPEAN JOURNAL OF BIOCHEMISTRY 5
## 52 FISHERIES RESEARCH 5
## 53 ICOPA IX - 9TH INTERNATIONAL CONGRESS OF PARASITOLOGY 5
## 54 JOURNAL OF EVOLUTIONARY BIOLOGY 5
## 55 JOURNAL OF HEREDITY 5
## 56 JOURNAL OF THE AMERICAN MOSQUITO CONTROL ASSOCIATION 5
## 57 MYCOLOGICAL RESEARCH 5
## 58 PARASITE 5
## 59 PARASITOLOGY INTERNATIONAL 5
## 60 ARCHIVES OF MEDICAL RESEARCH 4
## 61 BIOCHEMICAL GENETICS 4
## 62 BIOCHEMISTRY 4
## 63 BMC INFECTIOUS DISEASES 4
## 64 CANADIAN JOURNAL OF ZOOLOGY-REVUE CANADIENNE DE ZOOLOGIE 4
## 65 FEMS MICROBIOLOGY LETTERS 4
## 66 GENETICA 4
## 67 INFECTION AND IMMUNITY 4
## 68 INSECT BIOCHEMISTRY AND MOLECULAR BIOLOGY 4
## 69 INTERNATIONAL JOURNAL OF DERMATOLOGY 4
## 70 JOURNAL OF BIOLOGICAL CHEMISTRY 4
## 71 JOURNAL OF VECTOR ECOLOGY 4
## 72 MOLECULAR BIOLOGY AND EVOLUTION 4
## 73 PARASITOLOGY TODAY 4
## 74 PLANT PATHOLOGY 4
## 75 PLOS NEGLECTED TROPICAL DISEASES 4
## 76 PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES 4
## 77 TROPENMEDIZIN UND PARASITOLOGIE 4
## 78 TROPICAL MEDICINE AND PARASITOLOGY 4
## 79 ACTA PARASITOLOGICA 3
## 80 ANTIMICROBIAL AGENTS AND CHEMOTHERAPY 3
## 81 BEHAVIORAL ECOLOGY AND SOCIOBIOLOGY 3
## 82 BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS 3
## 83 BIOCHEMICAL PHARMACOLOGY 3
## 84 BIOLOGICAL CONTROL 3
## 85 BULLETIN DE LA SOCIETE DE PATHOLOGIE EXOTIQUE 3
## 86 CANADIAN JOURNAL OF BOTANY-REVUE CANADIENNE DE BOTANIQUE 3
## 87 CHINESE MEDICAL JOURNAL 3
## 88 COMPTES RENDUS DE L ACADEMIE DES SCIENCES SERIE III-SCIENCES DE LA VIE-LIFE SCIENCES 3
## 89 ENTOMOLOGIA EXPERIMENTALIS ET APPLICATA 3
## 90 EUROPEAN JOURNAL OF ENTOMOLOGY 3
## 91 GENETICS 3
## 92 INDIAN JOURNAL OF MEDICAL RESEARCH 3
## 93 INDIAN JOURNAL OF MEDICAL RESEARCH SECTION A-INFECTIOUS DISEASES 3
## 94 INFECTIOUS AGENTS AND DISEASE-REVIEWS ISSUES AND COMMENTARY 3
## 95 INSECT SCIENCE AND ITS APPLICATION 3
## 96 INSECTES SOCIAUX 3
## 97 INTERNATIONAL JOURNAL OF FOOD MICROBIOLOGY 3
## 98 INVESTIGACION CLINICA 3
## 99 JAPANESE JOURNAL OF APPLIED ENTOMOLOGY AND ZOOLOGY 3
## 100 JOURNAL OF FISH BIOLOGY 3
##
##
## Most Relevant Keywords
##
## Author Keywords (DE) Articles Keywords-Plus (ID) Articles
## 1 ALLOZYMES 83 IDENTIFICATION 179
## 2 TRYPANOSOMA CRUZI 75 DIFFERENTIATION 110
## 3 ISOENZYMES 53 POPULATIONS 103
## 4 ISOENZYME 52 DNA 87
## 5 ELECTROPHORESIS 42 CHAGAS-DISEASE 78
## 6 POPULATION GENETICS 40 STRAINS 78
## 7 LEISHMANIA 36 EVOLUTION 72
## 8 TAXONOMY 36 INFECTION 68
## 9 PHYLOGENY 32 VARIABILITY 67
## 10 LEISHMANIASIS 31 CUTANEOUS LEISHMANIASIS 66
## 11 LEISHMANIA INFANTUM 30 ELECTROPHORESIS 60
## 12 ALLOZYME 29 ISOENZYME PATTERNS 55
## 13 CARBONIC ANHYDRASE 28 BRAZIL 49
## 14 EPIDEMIOLOGY 26 VISCERAL LEISHMANIASIS 47
## 15 CHAGAS DISEASE 25 ISOENZYME 46
## 16 PCR 25 PARASITES 46
## 17 ISOZYME 23 EXPRESSION 44
## 18 ISOZYMES 23 NATURAL-POPULATIONS 44
## 19 GENE FLOW 22 ZYMODEMES 43
## 20 GENETIC VARIATION 22 DIVERSITY 42
## 21 ZYMODEME 22 POPULATION 38
## 22 DIAGNOSIS 21 DIAGNOSIS 37
## 23 GENETIC DIVERSITY 21 DIPTERA 37
## 24 RAPD 21 COMPLEX 36
## 25 ENTAMOEBA HISTOLYTICA 20 RESISTANCE 36
## 26 MALARIA 20 POPULATION-STRUCTURE 35
## 27 BRAZIL 19 AGENT 34
## 28 CUTANEOUS LEISHMANIASIS 19 ALLOZYME 34
## 29 SYSTEMATICS 19 PLASMODIUM-FALCIPARUM 34
## 30 ALLOZYME ELECTROPHORESIS 17 TRANSMISSION 34
## 31 CHARACTERIZATION 17 TRYPANOSOMA-CRUZI 34
## 32 MORPHOLOGY 17 PATTERNS 33
## 33 SPECIATION 17 POLYMERASE CHAIN-REACTION 32
## 34 ISOENZYME ELECTROPHORESIS 16 STOCKS 32
## 35 POLYMORPHISM 16 POLYMERASE-CHAIN-REACTION 31
## 36 RESISTANCE 16 POPULATION-GENETICS 31
## 37 ENTAMOEBA DISPAR 15 SEQUENCES 31
## 38 MITOCHONDRIAL DNA 15 AMPLIFICATION 30
## 39 GENETIC VARIABILITY 14 MITOCHONDRIAL-DNA 30
## 40 LEISHMANIA TROPICA 14 POLYMORPHISM 30
## 41 MICROSATELLITES 14 MONOCLONAL-ANTIBODIES 29
## 42 PARASITE 14 CULICIDAE 28
## 43 POPULATION STRUCTURE 14 ENTAMOEBA-HISTOLYTICA 28
## 44 VISCERAL LEISHMANIASIS 14 PURIFICATION 28
## 45 GENETIC 13 SEQUENCE 28
## 46 GENETIC STRUCTURE 13 KINETOPLAST DNA 27
## 47 GENETICS 13 MARKERS 27
## 48 LEISHMANIA DONOVANI 13 MICE 27
## 49 TRYPANOSOMA BRUCEI 13 EPIDEMIOLOGY 26
## 50 ESTERASE 12 HOST 26
## 51 EVOLUTION 12 ISOENZYME CHARACTERIZATION 26
## 52 POPULATION 12 LEISHMANIASIS 26
## 53 PROTOZOA 12 PARASITIC PROTOZOA 26
## 54 SULFONAMIDE 12 SYSTEMATICS 26
## 55 ZYMODEMES 12 CLONES 25
## 56 AMEBIASIS 11 OLD-WORLD 25
## 57 CHAGAS' DISEASE 11 VIRULENCE 25
## 58 LEISHMANIA MAJOR 11 DISTANCE 24
## 59 TETRANYCHUS URTICAE 11 AMEBIASIS 23
## 60 HYBRIDIZATION 10 BRUCEI 23
## 61 IRAN 10 ESCHERICHIA-COLI 23
## 62 ISOENZYME ANALYSIS 10 GENETIC-VARIATION 23
## 63 MOLECULAR 10 PARASITE 23
## 64 MORPHOMETRICS 10 LEISHMANIA 22
## 65 NEMATODE 10 PSYCHODIDAE 22
## 66 POLYMERASE CHAIN REACTION 10 RIBOSOMAL DNA 22
## 67 SIBLING SPECIES 10 AMPLIFIED POLYMORPHIC DNA 21
## 68 TRYPANOSOMA-BRUCEI 10 DISEASE 21
## 69 TRYPANOSOMA-CRUZI 10 GENE 21
## 70 COLOMBIA 9 ISOZYME-II 21
## 71 DNA 9 METABOLISM 21
## 72 HETEROZYGOSITY 9 PCR 21
## 73 IDENTIFICATION 9 PROTEIN 21
## 74 INHIBITOR 9 SPECIATION 21
## 75 MULTILOCUS ENZYME ELECTROPHORESIS 9 CLASSIFICATION 20
## 76 RFLP 9 DONOVANI 20
## 77 CHAGAS 8 INFANTUM 20
## 78 CRYPTIC SPECIES 8 INVITRO 20
## 79 ENTAMOEBA-HISTOLYTICA 8 POLYMORPHISMS 20
## 80 GENETIC EXCHANGE 8 FLOW 19
## 81 HYMENOPTERA 8 GENES 19
## 82 METALLOENZYMES 8 ASCARIDOIDEA 18
## 83 MICROSATELLITE 8 CLONING 18
## 84 MOLECULAR SYSTEMATICS 8 PROTEINS 18
## 85 PATHOGENICITY 8 TAXONOMY 18
## 86 SELECTION 8 CRUZI 17
## 87 SPECIES COMPLEX 8 CRYSTAL-STRUCTURE 17
## 88 TRIATOMA INFESTANS 8 ENZYMES 17
## 89 VARIATION 8 ISOENZYME ELECTROPHORESIS 17
## 90 BOLIVIA 7 SUBGENUS TRYPANOZOON 17
## 91 CHINA 7 ACTIVE-SITE 16
## 92 COEVOLUTION 7 ENZYME 16
## 93 DISEASE 7 ISOENZYME ANALYSIS 16
## 94 DOG 7 ISOZYME 16
## 95 DRUG RESISTANCE 7 MOSQUITOS 16
## 96 ETHIOPIA 7 RIBOSOMAL-RNA 16
## 97 GENETIC DIFFERENTIATION 7 SANDFLIES 16
## 98 GENETIC DISTANCE 7 ARBITRARY PRIMERS 15
## 99 GIARDIA 7 GENETIC DIFFERENTIATION 15
## 100 GLYCOLYSIS 7 GENUS 15
We may want to save the top 10 countries that have published for this dataset
citations_ana.sum$MostProdCountries
## Country Articles Freq SCP MCP MCP_Ratio
## 1 USA 189 0.130525 134 55 0.2910
## 2 BRAZIL 137 0.094613 100 37 0.2701
## 3 UNITED KINGDOM 134 0.092541 76 58 0.4328
## 4 FRANCE 110 0.075967 69 41 0.3727
## 5 ITALY 100 0.069061 56 44 0.4400
## 6 JAPAN 67 0.046271 43 24 0.3582
## 7 AUSTRALIA 66 0.045580 53 13 0.1970
## 8 SPAIN 60 0.041436 34 26 0.4333
## 9 GERMANY 39 0.026934 24 15 0.3846
## 10 MEXICO 31 0.021409 24 7 0.2258
## 11 CANADA 30 0.020718 23 7 0.2333
## 12 INDIA 27 0.018646 22 5 0.1852
## 13 BELGIUM 26 0.017956 11 15 0.5769
## 14 SWITZERLAND 24 0.016575 14 10 0.4167
## 15 ARGENTINA 22 0.015193 16 6 0.2727
## 16 CHINA 19 0.013122 16 3 0.1579
## 17 COLOMBIA 19 0.013122 14 5 0.2632
## 18 KENYA 18 0.012431 11 7 0.3889
## 19 VENEZUELA 18 0.012431 11 7 0.3889
## 20 IRAN 17 0.011740 13 4 0.2353
## 21 THAILAND 17 0.011740 4 13 0.7647
## 22 BOLIVIA 14 0.009669 3 11 0.7857
## 23 EGYPT 14 0.009669 11 3 0.2143
## 24 NEW ZEALAND 13 0.008978 12 1 0.0769
## 25 ISRAEL 12 0.008287 6 6 0.5000
## 26 NETHERLANDS 12 0.008287 3 9 0.7500
## 27 CZECH REPUBLIC 11 0.007597 6 5 0.4545
## 28 AUSTRIA 10 0.006906 8 2 0.2000
## 29 CHILE 10 0.006906 5 5 0.5000
## 30 SLOVAKIA 10 0.006906 0 10 1.0000
## 31 TURKEY 10 0.006906 8 2 0.2000
## 32 PORTUGAL 8 0.005525 1 7 0.8750
## 33 SOUTH AFRICA 8 0.005525 6 2 0.2500
## 34 TUNISIA 8 0.005525 1 7 0.8750
## 35 POLAND 7 0.004834 5 2 0.2857
## 36 SWEDEN 7 0.004834 5 2 0.2857
## 37 CAMEROON 6 0.004144 2 4 0.6667
## 38 DENMARK 6 0.004144 3 3 0.5000
## 39 ETHIOPIA 6 0.004144 1 5 0.8333
## 40 MOROCCO 6 0.004144 3 3 0.5000
## 41 FINLAND 5 0.003453 2 3 0.6000
## 42 IRAQ 5 0.003453 2 3 0.6000
## 43 SUDAN 5 0.003453 0 5 1.0000
## 44 GEORGIA 4 0.002762 2 2 0.5000
## 45 HUNGARY 4 0.002762 4 0 0.0000
## 46 KOREA 4 0.002762 3 1 0.2500
## 47 MALAYSIA 4 0.002762 3 1 0.2500
## 48 PANAMA 4 0.002762 3 1 0.2500
## 49 RUSSIA 4 0.002762 3 1 0.2500
## 50 TAIWAN 4 0.002762 0 4 1.0000
## 51 UGANDA 4 0.002762 2 2 0.5000
## 52 ALGERIA 3 0.002072 1 2 0.6667
## 53 PARAGUAY 3 0.002072 1 2 0.6667
## 54 SERBIA 3 0.002072 2 1 0.3333
## 55 URUGUAY 3 0.002072 2 1 0.3333
## 56 ZIMBABWE 3 0.002072 0 3 1.0000
## 57 BANGLADESH 2 0.001381 1 1 0.5000
## 58 BULGARIA 2 0.001381 2 0 0.0000
## 59 COSTA RICA 2 0.001381 2 0 0.0000
## 60 CROATIA 2 0.001381 0 2 1.0000
## 61 ECUADOR 2 0.001381 1 1 0.5000
## 62 GREECE 2 0.001381 0 2 1.0000
## 63 IRELAND 2 0.001381 1 1 0.5000
## 64 LEBANON 2 0.001381 2 0 0.0000
## 65 MALTA 2 0.001381 1 1 0.5000
## 66 PERU 2 0.001381 1 1 0.5000
## 67 ROMANIA 2 0.001381 2 0 0.0000
## 68 SAUDI ARABIA 2 0.001381 0 2 1.0000
## 69 SRI LANKA 2 0.001381 0 2 1.0000
## 70 YEMEN 2 0.001381 0 2 1.0000
## 71 BAHAMAS 1 0.000691 0 1 1.0000
## 72 BAHRAIN 1 0.000691 1 0 0.0000
## 73 BURKINA FASO 1 0.000691 0 1 1.0000
## 74 ESTONIA 1 0.000691 1 0 0.0000
## 75 GUATEMALA 1 0.000691 1 0 0.0000
## 76 MAURITANIA 1 0.000691 1 0 0.0000
## 77 NIGERIA 1 0.000691 1 0 0.0000
## 78 PAKISTAN 1 0.000691 0 1 1.0000
## 79 SLOVENIA 1 0.000691 1 0 0.0000
## 80 ZAMBIA 1 0.000691 0 1 1.0000
#bar chart of top 10 countries
df_count<-data.frame(Country=as.character(citations_ana.sum$MostProdCountries$`Country `),Article_count=as.integer(citations_ana.sum$MostProdCountries$Articles)) %>% slice(.,1:10)
ggplot(df_count, aes(Country, Article_count)) +
geom_bar(stat = "identity",fill=brewer.pal(10, "Spectral")) +
coord_flip() +
theme_bw()
#with everyone else category
vec<-as.data.frame(citations_ana$Countries,stringsAsFactors = F) %>% filter(!Tab %in% trimws(as.character(df_count$Country),which = c("both", "left", "right"))) %>% select(.,Freq) %>% sum()
vec2<-data.frame(Country='OTHER',Article_count=as.integer(vec))
df_count<-rbind(df_count,vec2)
ggplot(df_count, aes(Country, Article_count)) +
geom_bar(stat = "identity",fill=brewer.pal(11, "Spectral")) +
coord_flip() +
theme_bw()
#write it out as a table
write.table(citations_ana.sum$MostProdCountries,'TopProducingCountriesForAllozymeParasiteSearch',row.names=F,quote=F,sep='\t')
We are interested in how many papers are produced per year - we can see that in the summary file. We can also calculate the length of time it took for X number of publications
#citations_ana.sum$AnnualProduction
#to see when XX % of papers were published
table<-citations_ana.sum$AnnualProduction %>% mutate(cumsum=cumsum(Articles),cumper=cumsum(Articles)/sum(Articles)*100)
table
## Year Articles cumsum cumper
## 1 1966 1 1 0.066357
## 2 1968 1 2 0.132714
## 3 1973 1 3 0.199071
## 4 1974 1 4 0.265428
## 5 1975 3 7 0.464499
## 6 1977 2 9 0.597213
## 7 1978 4 13 0.862641
## 8 1979 3 16 1.061712
## 9 1980 6 22 1.459854
## 10 1981 3 25 1.658925
## 11 1982 9 34 2.256138
## 12 1983 7 41 2.720637
## 13 1984 11 52 3.450564
## 14 1985 5 57 3.782349
## 15 1986 7 64 4.246848
## 16 1987 15 79 5.242203
## 17 1988 5 84 5.573988
## 18 1989 8 92 6.104844
## 19 1990 17 109 7.232913
## 20 1991 67 176 11.678832
## 21 1992 84 260 17.252820
## 22 1993 70 330 21.897810
## 23 1994 67 397 26.343729
## 24 1995 66 463 30.723291
## 25 1996 65 528 35.036496
## 26 1997 72 600 39.814200
## 27 1998 95 695 46.118115
## 28 1999 82 777 51.559390
## 29 2000 67 844 56.005309
## 30 2001 51 895 59.389516
## 31 2002 54 949 62.972794
## 32 2003 57 1006 66.755143
## 33 2004 60 1066 70.736563
## 34 2005 41 1107 73.457200
## 35 2006 57 1164 77.239549
## 36 2007 42 1206 80.026543
## 37 2008 44 1250 82.946251
## 38 2009 40 1290 85.600531
## 39 2010 27 1317 87.392170
## 40 2011 27 1344 89.183809
## 41 2012 19 1363 90.444592
## 42 2013 22 1385 91.904446
## 43 2014 23 1408 93.430657
## 44 2015 28 1436 95.288653
## 45 2016 29 1465 97.213006
## 46 2017 23 1488 98.739217
## 47 2018 19 1507 100.000000
write.table(table,'ProductionPerYearForAllozymeParasiteSearch',row.names=F,quote=F,sep='\t')
#basic line graph
ggplot(citations_ana.sum$AnnualProduction, aes(`Year `,Articles, group=1)) +
geom_line(aes(`Year `,Articles))
ggplot(citations_ana.sum$AnnualProduction, aes(`Year `,Articles, group=1)) +
geom_point(aes(citations_ana.sum$AnnualProduction$`Year `,citations_ana.sum$AnnualProduction$Articles), size = 3,colour='red') +
geom_line(aes(`Year `,Articles)) +
labs(title="Allozymes",x='Year', y='Article Number', fill="Subset") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
#create some splines to smooth the curve
spline_int <- as.data.frame(spline(citations_ana.sum$AnnualProductio$`Year `, citations_ana.sum$AnnualProduction$Articles))
#just make it look a bit prettier
ggplot(citations_ana.sum$AnnualProduction) +
geom_point(aes(citations_ana.sum$AnnualProduction$`Year `,citations_ana.sum$AnnualProduction$Articles), size = 3) +
geom_line(data = spline_int, aes(x,y)) +
geom_area(data = spline_int, aes(x,y,fill='red'),alpha=0.6) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(title="Allozymes",x='Year', y='Article Number', fill="Subset") +
scale_fill_manual(labels = "Parasites", values = alpha("red",.6))
We also want to do some word clouds. There is two ways to use word clouds. The first way is that we treat every word in the keyword list individually. There is two types of keywords. The first is the Author Keywords, the second is the ‘Keywords-Plus’ which is generated by WoS. Im going to use the Author Keywords but commented in is the lines for the ‘Keywords-Plus’.
For author keywords: citations_ana.sum$MostRelKeywords$`Author Keywords (DE) `
For WoS keywords: citations_ana.sum$MostRelKeywords$`Keywords-Plus (ID) `
Here we will build a new dataframe of our keywords and filter out the stuff we dont want before creating a corpus. Which is a sort of list used by text mining packages in R. Im not totally sure what is special about it - but we need it!
I have done this example of the allozyme data and have tried roughly to filter out method specific terms. If you are in the allozyme group you will need to remove this and re-do this properly as I did this very quickly and may have over or under filtered!!
#careful with this table the biblio function has built a table which has two columns named 'Articles'
head(citations_ana.sum$MostRelKeywords)
## Author Keywords (DE) Articles Keywords-Plus (ID) Articles
## 1 ALLOZYMES 83 IDENTIFICATION 179
## 2 TRYPANOSOMA CRUZI 75 DIFFERENTIATION 110
## 3 ISOENZYMES 53 POPULATIONS 103
## 4 ISOENZYME 52 DNA 87
## 5 ELECTROPHORESIS 42 CHAGAS-DISEASE 78
## 6 POPULATION GENETICS 40 STRAINS 78
colnames(citations_ana.sum$MostRelKeywords)
## [1] "Author Keywords (DE) " "Articles" "Keywords-Plus (ID) " "Articles"
#get rid of punctuation and create data frame
forwordcloud<-as.data.frame(cbind(as.character(trimws(citations_ana.sum$MostRelKeywords$`Author Keywords (DE) `, which = c("both", "left", "right"))),citations_ana.sum$MostRelKeywords[2]),stringsAsFactors=FALSE)
colnames(forwordcloud)<-c('keyword','count_papers')
#if we want to plot 'Keywords-Plus (ID)'
#forwordcloud<-as.data.frame(cbind(as.character(trimws(citations_ana.sum$MostRelKeywords$`Keywords-Plus (ID) `, which = c("both", "left", "right"))),citations_ana.sum$MostRelKeywords[4]),stringsAsFactors=FALSE)
#dataframe:
head(forwordcloud)
## keyword count_papers
## 1 ALLOZYMES 83
## 2 TRYPANOSOMA CRUZI 75
## 3 ISOENZYMES 53
## 4 ISOENZYME 52
## 5 ELECTROPHORESIS 42
## 6 POPULATION GENETICS 40
#we want to drop the keywords we searched for from our dataframe
forwordcloud<- forwordcloud %>% filter(!grepl('allozyme|electrophoresis|isoenzyme|isozyme|rapd|carbonic anhydrase|aflp|creatine kinase|protein kinase|alkaline phosphatase|cytochrome P450|glutathione S-transferase|alcohol dehydrogenase|lactate dehydrogenase|catalase|aldehyde dehydrogenase|hexokinase|peroxidase|5 alpha-reductase',keyword,ignore.case = TRUE))
#create corpus
forwordcloud.Corpus<-Corpus(VectorSource(forwordcloud[rep(row.names(forwordcloud), forwordcloud$count_papers), 1]))
#can use the function inspect to display the information on the corpus
#inspect(forwordcloud.Corpus)
#we dont have any special characters but we could remove funky characters if we have them
#these will throw a warning but dont worry they dont mean anything here-its not dropping documents its because I used a vector source for the corpus and for whatever reason that generates a warning
#forwordcloud.Corpus<- tm_map(forwordcloud.Corpus, removePunctuation)
#forwordcloud.Corpus <- tm_map(forwordcloud.Corpus, removeNumbers)
#forwordcloud.Corpus <- tm_map(forwordcloud.Corpus, stripWhitespace)
#forwordcloud.Corpus <- tm_map(forwordcloud.Corpus, remove_stopwords) #with package 'tau'
#forwordcloud.Corpus <- tm_map(forwordcloud.Corpus,content_transformer(tolower))
#qdap package offers other cleaning functions if we need them
#create wordclouds
wordcloud(forwordcloud.Corpus,scale=c(2.0,.6),max.words=30)
#make it pretty-look up brewer.pal for colour pallets https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/colorPaletteCheatsheet.pdf
wordcloud(forwordcloud.Corpus,colors=brewer.pal(8, "Dark2"),max.words=30,scale=c(2.0,.6))
You can see that ‘genetic’ and ‘genetics’ come up. You can try to use the stemming function to look for the root of the word but I found this a bit ugly and ended up doing it by hand.
#steming function
#forwordcloud.Corpus <- tm_map(forwordcloud.Corpus,content_transformer(tolower))
#forwordcloud.doc<- tm_map(forwordcloud.Corpus, stemDocument, "english")
#wordcloud(forwordcloud.doc,colors=brewer.pal(8, "Dark2"))
#code to reduce redundancy by hand
forwordcloud<-forwordcloud %>% mutate(fixkeyword=sub("GENETICS", "GENETIC", keyword))
forwordcloud.Corpus<-Corpus(VectorSource(forwordcloud[rep(row.names(forwordcloud), forwordcloud$count_papers), 3]))
wordcloud(forwordcloud.Corpus,colors=brewer.pal(8, "Dark2"),max.words=30,scale=c(1.8,.6))
#you can change the percentage that are rotated with the rot.per call
wordcloud(forwordcloud.Corpus,colors=brewer.pal(8, "Dark2"),max.words=30,rot.per=0,scale=c(1.8,.8))
The second way to make a word cloud is to consider the whole phrase a word. I think this makes more sense but it may not be as nice to look at.
wordcloud(tolower(forwordcloud$keyword),as.numeric(forwordcloud$count_papers), colors="black",max.words=30,scale=c(2.0,.6))
#colours and font
wordcloud(tolower(forwordcloud$keyword),as.numeric(forwordcloud$count_papers), colors=brewer.pal(8, "Set1"),max.words=30,scale=c(1.5,.6))
wordcloud(tolower(forwordcloud$keyword),as.numeric(forwordcloud$count_papers), colors=brewer.pal(8, "Dark2"),vfont=c("script","bold"),max.words=30,rot.per=0,scale=c(1.8,.6))
wordcloud(tolower(forwordcloud$keyword),as.numeric(forwordcloud$count_papers), colors=brewer.pal(8, "Dark2"),family = "mono",font = 2,max.words=30,scale=c(1.3,.6))
This block will be if you are comparing parasite searches that you downloaded 500 at a time to the ‘publication per regions’ and ‘publication per year’ files that you downloaded from the WoS website for the general search terms (via the analyse function in WoS).
Bring in the file that you brought down from WoS and set it up with our orginal file from above:
d1v2<-read.table('../broadersearchers/AllozymePerYearBroadSearch.txt', sep="\t",header=T,row.names=NULL)
head(d1v2)
## Publication Years records_._of_38597
## 1 2019 1 0.003
## 2 2018 432 1.119
## 3 2017 570 1.477
## 4 2016 583 1.510
## 5 2015 580 1.503
## 6 2014 676 1.751
colnames(d1v2) <- c("Year", "ArticlesGeneral","PercentArticles")
d1v2<-d1v2 %>% arrange(.,Year) %>% mutate(PercentPerYearGeneral=cumsum(ArticlesGeneral)/sum(ArticlesGeneral)*100) %>% select(.,-PercentArticles)
d1v2$Year <- as.character(d1v2$Year)
#have to merge them with earlier dataset
df1<-citations_ana.sum$AnnualProduction
colnames(df1)
## [1] "Year " "Articles"
#this is a good example of how not to name column names-the biblio package adds a bunch of trailing white space which is super frustrating to work around
colnames(df1) <- c("Year", "ArticlesParasite")
df1<-df1 %>% arrange(.,Year) %>% mutate(PercentPerYearParasites=cumsum(ArticlesParasite)/sum(ArticlesParasite)*100)
df1$Year <- as.character(df1$Year)
dmerged<-full_join(d1v2,df1,by='Year',all=TRUE)
head(dmerged)
## Year ArticlesGeneral PercentPerYearGeneral ArticlesParasite PercentPerYearParasites
## 1 1960 1 0.002590875 NA NA
## 2 1962 9 0.025908749 NA NA
## 3 1963 16 0.067362748 NA NA
## 4 1964 30 0.145088997 NA NA
## 5 1965 31 0.225406120 NA NA
## 6 1966 45 0.341995492 1 0.066357
#NA should be 0
dmerged[is.na(dmerged)] <- 0
dmerged$Year<-as.integer(dmerged$Year)
#lets drop 2019 because its a bit of a dumb point
dmerged %>% select(.,ArticlesGeneral,ArticlesParasite,Year) %>% filter(.,Year!=2019) %>% tidyr::gather("id", "value", 1:2) %>% ggplot(aes(Year, value)) +
geom_point(aes(colour = factor(id)),size = 1) +
geom_line(aes(colour = factor(id))) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(title="Allozymes",x='Year', y=expression(sqrt(italic('Article Number'))), fill="Subset",color = "Article Type\n") +
scale_y_continuous(trans='sqrt')
#for splines
dmerged<-dmerged %>% filter(.,Year!=2019)
spline_int <- as.data.frame(spline(dmerged$Year, dmerged$ArticlesParasite))
spline_int2 <- as.data.frame(spline(dmerged$Year, dmerged$ArticlesGeneral))
spline_int$y[spline_int$y < 0] <- 0
ggplot(dmerged) +
geom_point(aes(dmerged$Year,dmerged$ArticlesGeneral), col='red',size = 1) +
geom_point(aes(dmerged$Year,dmerged$ArticlesParasite), col='blue',size = 1) +
geom_line(data = spline_int2, aes(x,y)) +
geom_area(data = spline_int2, aes(x,y,fill='blue')) +
geom_line(data = spline_int, aes(x,y)) +
geom_area(data = spline_int, aes(x,y,fill='red')) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(title="Allozymes",x='Year', y=expression(sqrt(italic('Article Number'))), fill="Subset") +
scale_fill_manual(labels = c("Everyone", "Parasites"), values = alpha(c("red", "blue"),.6)) +
scale_y_continuous(trans='sqrt')
And you can do the same sort of bar graphs for the regions from the file you download from WoS countries/regions tab
df_count<-read.table('../broadersearchers/AllozymePerCountryBroadSearch.txt',header=T,row.names=NULL,sep='\t')
head(df_count)
## Countries_Regions records percent_of_38597
## 1 USA 12395 32.114
## 2 JAPAN 3321 8.604
## 3 GERMANY 2305 5.972
## 4 ITALY 2170 5.622
## 5 FRANCE 2000 5.182
## 6 ENGLAND 1991 5.158
#take top 10
df_count %>% arrange(.,desc(records)) %>% slice(.,1:10) %>% ggplot(., aes(Countries_Regions, records)) +
geom_bar(stat = "identity",fill=brewer.pal(10, "Spectral")) +
coord_flip() +
theme_bw()
#with everyone else category
vec<-df_count %>% arrange(.,desc(records)) %>% slice(.,11:nrow(.)) %>% select(.,records) %>% sum()
vec2<-data.frame(Countries_Regions='OTHER',records=as.integer(vec))
df_count<-df_count %>% arrange(.,desc(records)) %>% slice(.,1:10) %>% select(.,-percent_of_38597) %>% rbind(.,vec2)
ggplot(df_count, aes(Countries_Regions, records)) +
geom_bar(stat = "identity",fill=brewer.pal(11, "Spectral")) +
coord_flip() +
theme_bw()
This block will be if you are comparing parasite searches that you downloaded 500 at a time to the publication per regions and publication per year files that you downloaded using the API.
Bring in the file that you brought down using the API and set it up with our orginal file from above:
df_api<-read.table('../broadersearchers/AllozymeFromAPI.txt',header=F,row.names=NULL,sep='|',quote="",comment.char="",stringsAsFactors = F)
colnames(df_api) <- c("WoS_id", "Title","Year","Author","Journal","Keywords","Article_type")
head(df_api)
## WoS_id
## 1 WOS:A1994QA56200011
## 2 WOS:A1982NQ93200006
## 3 WOS:A1997WH72500012
## 4 WOS:A1978FT73500009
## 5 WOS:A1991HG72000058
## 6 WOS:000224155500052
## Title
## 1 DETECTION OF A NOVEL LACTATE-DEHYDROGENASE ISOZYME AND AN APPARENT DIFFERENTIATION-ASSOCIATED SHIFT IN ISOZYME PROFILE IN HEPATOMA-CELL LINES
## 2 SPECIES-SPECIFIC OR ISOZYME-SPECIFIC ENZYME-INHIBITORS .4. DESIGN OF A 2-SITE INHIBITOR OF ADENYLATE KINASE WITH ISOENZYME SELECTIVITY
## 3 Inheritance and linkage relationships of nine isozyme loci in wild radish
## 4 USE OF ADENINE-NUCLEOTIDE DERIVATIVES TO ASSESS POTENTIAL OF EXO-ACTIVE-SITE-DIRECTED REAGENTS AS SPECIES-SPECIFIC OR ISOZYME-SPECIFIC ENZYME INACTIVATORS .2. ISOZYME-SPECIFIC INACTIVATION OF A MAMMALIAN ENZYME AND ITS SIGNIFICANCE IN POSSIBLE DESIGN OF FETAL ISOENZYME TARGETED ANTI-NEOPLASTIC AGENTS
## 5 INDUCTION, PURIFICATION, AND CHARACTERIZATION OF CYTOCHROME-P450IIE
## 6 Isoenzyme polymorphism of some grapevine (Vitis vinifera L.) cultivars
## Year Author Journal
## 1 1994 LIU, TZ CANCER LETTERS
## 2 1982 HAMPTON, A JOURNAL OF MEDICINAL CHEMISTRY
## 3 1997 Conner, JK JOURNAL OF HEREDITY
## 4 1978 HAMPTON, A JOURNAL OF MEDICINAL CHEMISTRY
## 5 1991 YANG, CS METHODS IN ENZYMOLOGY
## 6 2004 Jahnke, GG PROCEEDINGS OF THE 1ST INTERNATIONAL SYMPOSIUM ON GRAPEVINE GROWING, COMMERCE AND RESEARCH
## Keywords Article_type
## 1 HEPATOMA CELLS Article
## 2 <NA> Article
## 3 <NA> Article
## 4 <NA> Article
## 5 <NA> Review
## 6 isoenzyme Proceedings Paper
df_api_year<-count(df_api,Year) %>% arrange(.,Year) %>% mutate(PercentPerYearGeneral=cumsum(n)/sum(n)*100)
colnames(df_api_year) <- c("Year","ArticlesGeneral","PercentPerYearGeneral")
df_api_year
## # A tibble: 59 x 3
## Year ArticlesGeneral PercentPerYearGeneral
## <int> <int> <dbl>
## 1 1960 1 0.00259
## 2 1962 9 0.0259
## 3 1963 16 0.0674
## 4 1964 30 0.145
## 5 1965 31 0.225
## 6 1966 45 0.342
## 7 1967 43 0.453
## 8 1968 84 0.671
## 9 1969 103 0.938
## 10 1970 84 1.16
## # ... with 49 more rows
write.table(df_api_year,'../broadersearchers/ProductionPerYearForAllozymeGeneralSearch_api',row.names=F,quote=F,sep='\t')
df_api_year$Year <- as.character(df_api_year$Year)
#have to merge them with earlier dataset
df1<-citations_ana.sum$AnnualProduction
colnames(df1) <- c("Year", "ArticlesParasite")
df1<-df1 %>% arrange(.,Year) %>% mutate(PercentPerYearParasites=cumsum(ArticlesParasite)/sum(ArticlesParasite)*100)
df1$Year <- as.character(df1$Year)
dmerged<-full_join(df_api_year,df1,by='Year',all=TRUE)
head(dmerged)
## # A tibble: 6 x 5
## Year ArticlesGeneral PercentPerYearGeneral ArticlesParasite PercentPerYearParasites
## <chr> <int> <dbl> <int> <dbl>
## 1 1960 1 0.00259 NA NA
## 2 1962 9 0.0259 NA NA
## 3 1963 16 0.0674 NA NA
## 4 1964 30 0.145 NA NA
## 5 1965 31 0.225 NA NA
## 6 1966 45 0.342 1 0.0664
#NA should be 0
dmerged[is.na(dmerged)] <- 0
dmerged$Year<-as.integer(dmerged$Year)
#lets drop 2019 because its a bit of a dumb point
dmerged %>% select(.,ArticlesGeneral,ArticlesParasite,Year) %>% filter(.,Year!=2019) %>% tidyr::gather("id", "value", 1:2) %>% ggplot(aes(Year, value)) +
geom_point(aes(colour = factor(id)),size = 1) +
geom_line(aes(colour = factor(id))) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(title="Allozymes",x='Year', y=expression(sqrt(italic('Article Number'))), fill="Subset",color = "Article Type\n") +
scale_y_continuous(trans='sqrt')
#for splines
dmerged<-dmerged %>% filter(.,Year!=2019)
spline_int <- as.data.frame(spline(dmerged$Year, dmerged$ArticlesParasite))
spline_int2 <- as.data.frame(spline(dmerged$Year, dmerged$ArticlesGeneral))
spline_int$y[spline_int$y < 0] <- 0
ggplot(dmerged) +
geom_point(aes(dmerged$Year,dmerged$ArticlesGeneral), col='red',size = 1) +
geom_point(aes(dmerged$Year,dmerged$ArticlesParasite), col='blue',size = 1) +
geom_line(data = spline_int2, aes(x,y)) +
geom_area(data = spline_int2, aes(x,y,fill='blue')) +
geom_line(data = spline_int, aes(x,y)) +
geom_area(data = spline_int, aes(x,y,fill='red')) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(title="Allozymes",x='Year', y=expression(sqrt(italic('Article Number'))), fill="Subset") +
scale_fill_manual(labels = c("Everyone", "Parasites"), values = alpha(c("red", "blue"),.6)) +
scale_y_continuous(trans='sqrt')
We can do a word cloud using the keywords from the API download
head(df_api$Keywords)
## [1] "HEPATOMA CELLS" NA NA NA NA "isoenzyme"
#treating each word individually, just spliting up mutlple key words into individual rows and removing search term words as before
df_api_Keywords <- df_api %>% select(.,Keywords) %>% separate_rows(.,Keywords,sep=",") %>% na.omit() %>% filter(!grepl('allozyme|electrophoresis|isoenzyme|isozyme|rapd|carbonic anhydrase|aflp|creatine kinase|protein kinase|alkaline phosphatase|cytochrome P450|glutathione S-transferase|alcohol dehydrogenase|lactate dehydrogenase|catalase|aldehyde dehydrogenase|hexokinase|peroxidase|5 alpha-reductase',Keywords,ignore.case = TRUE))
#needs a bit more cleaning for punctuation etc ignore warnings here
forwordcloud.Corpus<-Corpus(VectorSource(df_api_Keywords))
forwordcloud.Corpus<- tm_map(forwordcloud.Corpus, removePunctuation)
## Warning in tm_map.SimpleCorpus(forwordcloud.Corpus, removePunctuation): transformation drops documents
forwordcloud.Corpus <- tm_map(forwordcloud.Corpus, removeNumbers)
## Warning in tm_map.SimpleCorpus(forwordcloud.Corpus, removeNumbers): transformation drops documents
forwordcloud.Corpus <- tm_map(forwordcloud.Corpus, stripWhitespace)
## Warning in tm_map.SimpleCorpus(forwordcloud.Corpus, stripWhitespace): transformation drops documents
forwordcloud.Corpus <- tm_map(forwordcloud.Corpus,content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(forwordcloud.Corpus, content_transformer(tolower)): transformation drops documents
#inspect(forwordcloud.Corpus)
#limit word output
wordcloud(forwordcloud.Corpus,max.words = 50,scale=c(2.0,.6))
wordcloud(forwordcloud.Corpus,max.words = 50,colors=brewer.pal(8, "Dark2"),scale=c(2.0,.6))
#may need some more cleaning of terms
Again we can also consider each term as a single phrase as well
df_api_Keywords_count<-df_api_Keywords %>% count(Keywords) %>% arrange(.,desc(n)) %>% slice(.,1:100) %>% filter(!grepl(1,Keywords))
wordcloud(tolower(df_api_Keywords_count$Keywords),as.numeric(df_api_Keywords_count$n), colors="black",max.words=30,scale=c(1.4,.6))
#again we have some double ups
df_api_Keywords_count<-df_api_Keywords_count %>% mutate(fixkeyword=sub("Genetic diversity", "genetic diversity", Keywords)) %>% mutate(fixkeyword=sub("Antioxidant enzymes", "antioxidant enzymes", fixkeyword)) %>% group_by(.,fixkeyword) %>% summarise(n = sum(n)) %>% arrange(.,desc(n))
#colours and font
wordcloud(tolower(df_api_Keywords_count$fixkeyword),as.numeric(df_api_Keywords_count$n), colors=brewer.pal(8, "Set1"),max.words=30,scale=c(1.4,.6))
wordcloud(tolower(df_api_Keywords_count$fixkeyword),as.numeric(df_api_Keywords_count$n), colors=brewer.pal(8, "Dark2"),vfont=c("script","bold"),max.words=30,rot.per=0,scale=c(1.6,.6))
wordcloud(tolower(df_api_Keywords_count$fixkeyword),as.numeric(df_api_Keywords_count$n), colors=brewer.pal(8, "Dark2"),family = "mono",font = 2,max.words=30,scale=c(1.4,.6))
In your searches you will end up with multiple sets of downloads for parasites sets from WoS (using the 500 at a time approach). Make sure you move these into seperate folders for each search term so they dont get mixed up!
To bring in a second set of .bib files follow what we did above again.
file_list2<-list.files(path='../nonmedicalparasites/',pattern='*.bib',full.names=T)
citations_nonmed<-readFilesmod(dput(as.character(file_list2)))
## c("../nonmedicalparasites//savedrecs(4).bib", "../nonmedicalparasites//savedrecs(5).bib"
## )
citations_nonmed_df <- convert2df(citations_nonmed, dbsource = "isi", format = "bibtex")
##
## Converting your isi collection into a bibliographic dataframe
##
## Articles extracted 100
## Articles extracted 200
## Articles extracted 300
## Articles extracted 400
## Articles extracted 500
## Articles extracted 600
## Articles extracted 614
## Done!
##
##
## Generating affiliation field tag AU_UN from C1: Done!
citations_nomed_ana <- biblioAnalysis(citations_nonmed_df, sep = ";")
citations_nomed_ana.sum <- summary(object = citations_nomed_ana, k = 100, pause = FALSE)
##
##
## Main Information about data
##
## Documents 614
## Sources (Journals, Books, etc.) 262
## Keywords Plus (ID) 2009
## Author's Keywords (DE) 1472
## Period 1966 - 2018
## Average citations per documents 26.86
##
## Authors 1680
## Author Appearances 2314
## Authors of single authored documents 44
## Authors of multi authored documents 1636
##
## Documents per Author 0.365
## Authors per Document 2.74
## Co-Authors per Documents 3.77
## Collaboration Index 3
##
##
## Annual Scientific Production
##
## Year Articles
## 1966 1
## 1975 2
## 1978 3
## 1979 1
## 1980 2
## 1982 2
## 1983 2
## 1984 2
## 1985 1
## 1986 4
## 1987 6
## 1988 1
## 1989 4
## 1990 3
## 1991 30
## 1992 36
## 1993 32
## 1994 27
## 1995 24
## 1996 25
## 1997 42
## 1998 43
## 1999 32
## 2000 31
## 2001 23
## 2002 27
## 2003 27
## 2004 29
## 2005 15
## 2006 15
## 2007 17
## 2008 14
## 2009 19
## 2010 8
## 2011 7
## 2012 7
## 2013 9
## 2014 4
## 2015 13
## 2016 10
## 2017 9
## 2018 5
##
## Annual Percentage Growth Rate 4.003523
##
##
## Most Productive Authors
##
## Authors Articles Authors Articles Fractionalized
## 1 MATTIUCCI S 19 CLARK CG 4.00
## 2 NASCETTI G 16 ANDREWS RH 3.92
## 3 ANDREWS RH 14 PANIAGUA E 3.75
## 4 PANIAGUA E 12 GOKA K 3.67
## 5 GATTI S 11 VILAS R 3.50
## 6 SCAGLIA M 11 BEVERIDGE I 3.28
## 7 VILAS R 11 PETRI WA 3.14
## 8 KOBAYASHI S 10 MATTIUCCI S 3.13
## 9 POZIO E 10 TAKAFUJI A 3.00
## 10 BEVERIDGE I 9 NAVAJAS M 2.87
## 11 CHILTON NB 9 VERDYCK P 2.70
## 12 HAQUE R 9 MIRELMAN D 2.62
## 13 SNABEL V 9 CHILTON NB 2.49
## 14 TAKEUCHI T 9 NASCETTI G 2.49
## 15 BRUNO A 8 MONIS PT 2.33
## 16 CEVINI C 8 HAQUE R 2.30
## 17 GOKA K 8 SANMARTIN ML 2.25
## 18 NAVAJAS M 8 BLAIR D 2.17
## 19 TACHIBANA H 8 POZIO E 2.11
## 20 CLARK CG 7 DIAMOND LS 2.03
## 21 PAOLETTI M 7 EBERT D 2.03
## 22 SANMARTIN ML 7 OROZCO E 2.03
## 23 TAKAFUJI A 7 DUFFY JE 2.00
## 24 CABARET J 6 RANNALA BH 2.00
## 25 CIPRIANI P 6 CABARET J 1.98
## 26 MIRELMAN D 6 KOBAYASHI S 1.88
## 27 PASTEUR N 6 SARGEAUNT PG 1.87
## 28 PETRI WA 6 LITTLE TJ 1.83
## 29 RENAUD F 6 GATTI S 1.81
## 30 SARGEAUNT PG 6 SCAGLIA M 1.81
## 31 BLAIR D 5 TAKEUCHI T 1.81
## 32 DIAMOND LS 5 YAMAZAKI Y 1.75
## 33 EBERT D 5 SNABEL V 1.75
## 34 MAYRHOFER G 5 WILLIAMS JE 1.70
## 35 MONIS PT 5 JACKSON TFHG 1.68
## 36 PHILLIPS CB 5 STOUTHAMER R 1.67
## 37 ROMAN B 5 PHILLIPS CB 1.59
## 38 SATOVIC Z 5 EDWARDS DD 1.53
## 39 TSAGKARAKOU A 5 BURDON JJ 1.50
## 40 WEBB SC 5 CLAY K 1.50
## 41 WILLIAMS JE 5 CROFT BA 1.50
## 42 CASTILLO P 4 ILINE II 1.45
## 43 CUBERO JI 4 PASTEUR N 1.45
## 44 DARDE ML 4 ANTOLIN MF 1.33
## 45 DUBINSKY P 4 BOHONAK AJ 1.33
## 46 DUCHENE M 4 FERGUSON DJP 1.33
## 47 EDWARDS DD 4 GOTOH T 1.33
## 48 GOTOH T 4 NEVO E 1.33
## 49 HALL A 4 OSAKABE M 1.33
## 50 ILINE II 4 TOMAVO S 1.33
## 51 JACKSON TFHG 4 CEVINI C 1.33
## 52 LAROSA G 4 D'AMELIO S 1.31
## 53 LITTLE TJ 4 TACHIBANA H 1.31
## 54 OROZCO E 4 THOMPSON RCA 1.25
## 55 OSAKABE M 4 MAYRHOFER G 1.23
## 56 RUBIALES D 4 BRUNO A 1.22
## 57 STOUTHAMER R 4 STANLEY SL 1.20
## 58 SUZUKI J 4 VRIJENHOEK RC 1.20
## 59 THOMPSON RCA 4 CLOSE RL 1.17
## 60 TORRES AM 4 DARDE ML 1.17
## 61 VERDYCK P 4 PAOLETTI M 1.16
## 62 VOVLAS N 4 RENAUD F 1.16
## 63 WIEDERMANN G 4 TSAGKARAKOU A 1.12
## 64 YANAGI T 4 BURCHARD GD 1.08
## 65 ABAUNZA P 3 DUCHENE M 1.03
## 66 AGUIRRE A 3 BARRETT J 1.00
## 67 AVANZATI AM 3 BIASIOLO A 1.00
## 68 BARATTI M 3 BLANC DS 1.00
## 69 BELLISARIO B 3 BRUCKNER DA 1.00
## 70 BERECZKI J 3 BRYANT C 1.00
## 71 BERNINI F 3 CHACIN-BONILLA L 1.00
## 72 BERNUZZI AM 3 CLOUTMAN DG 1.00
## 73 BINDER M 3 DESSER SS 1.00
## 74 BOOMSMA JJ 3 DILLS WL 1.00
## 75 BOUTEILLE B 3 DUNLEY JE 1.00
## 76 BRACHA R 3 FORD BA 1.00
## 77 BRISCOE DA 3 FUSU L 1.00
## 78 BULLINI L 3 GARDNER JPA 1.00
## 79 BURCHARD GD 3 GARDNER SL 1.00
## 80 CARNEIRO RMDG 3 GRANT WN 1.00
## 81 CIANCHI R 3 GREENSTONE MH 1.00
## 82 CLOSE RL 3 HAASE M 1.00
## 83 CROFT BA 3 HAFNER MS 1.00
## 84 D'AMELIO S 3 HALL GS 1.00
## 85 DEMEEUS T 3 HINOMOTO N 1.00
## 86 DURAND P 3 HOSHIZAKI S 1.00
## 87 EISENBACK JD 3 HSIAO TH 1.00
## 88 EY PL 3 JAZAYERI JA 1.00
## 89 GIBSON DI 3 JEROME CA 1.00
## 90 GONZALEZRUIZ A 3 JOHANNESEN J 1.00
## 91 GUHL F 3 JOHNSON SG 1.00
## 92 HANZELOVA V 3 KANG NJ 1.00
## 93 HEINZE J 3 KAZMER DJ 1.00
## 94 KARSSEN G 3 KITASHIMA Y 1.00
## 95 KITASHIMA Y 3 KOLLAR A 1.00
## 96 LA ROSA G 3 LEBER AL 1.00
## 97 LAGNEL J 3 LEUCHTMANN A 1.00
## 98 LYMBERY AJ 3 LODE T 1.00
## 99 MARCHI L 3 LYMBERY AJ 1.00
## 100 MELONI BP 3 MARTIN FN 1.00
##
##
## Top manuscripts per citations
##
## Paper TC TCperYear
## 1 LINHART YB, 1996, ANNU REV ECOL SYST 798 36.27
## 2 DIAMOND LS, 1993, J EUKARYOT MICROBIOL 381 15.24
## 3 ARNAUD-HAOND S, 2007, MOL ECOL 336 30.55
## 4 SARGEAUNT PG, 1978, TRANS ROY SOC TROP MED HYG-a 298 7.45
## 5 DARDE ML, 1992, J PARASITOL 198 7.62
## 6 MATTIUCCI S, 1997, J PARASITOL 197 9.38
## 7 MONIS PT, 2003, INFECT GENET EVOL 189 12.60
## 8 MOLLER AP, 1998, BEHAV ECOL SOCIOBIOL 188 9.40
## 9 BLACK WC, 1992, BULL ENTOMOL RES 187 7.19
## 10 CLARK CG, 1991, MOL BIOCHEM PARASITOL 171 6.33
## 11 HAQUE R, 1998, J CLIN MICROBIOL 166 8.30
## 12 SOLTIS DE, 1991, AM J BOT-a 166 6.15
## 13 DYBDAHL MF, 1996, EVOLUTION 165 7.50
## 14 ZARLENGA DS, 1999, INT J PARASIT 162 8.53
## 15 LOXDALE HD, 1998, BULL ENTOMOL RES 154 7.70
## 16 SCHWARZ D, 2005, NATURE 151 11.62
## 17 BURDON JJ, 1993, ANNU REV PHYTOPATHOL 140 5.60
## 18 EBERT D, 1998, PROC R SOC B-BIOL SCI 137 6.85
## 19 LESSA EP, 1998, MOL PHYLOGENET EVOL 126 6.30
## 20 THOMAS Y, 2003, EVOLUTION 124 8.27
## 21 HAQUE R, 1995, J CLIN MICROBIOL 120 5.22
## 22 BAYMAN P, 1991, CAN J BOT -REV CAN BOT 112 4.15
## 23 SMITH MA, 2008, MOL ECOL RESOUR 111 11.10
## 24 NEFF BD, 2001, EVOLUTION 111 6.53
## 25 ZIJLSTRA C, 1995, PHYTOPATHOLOGY 111 4.83
## 26 TANNICH E, 1991, J CLIN MICROBIOL 110 4.07
## 27 NEVO E, 1998, GENET RESOUR CROP EVOL 107 5.35
## 28 MIRELMAN D, 1986, INFECT IMMUN 104 3.25
## 29 SARGEAUNT PG, 1978, TRANS ROY SOC TROP MED HYG 104 2.60
## 30 ACUNASOTO R, 1993, AM J TROP MED HYG 103 4.12
## 31 VALENTINI A, 2006, J PARASITOL 102 8.50
## 32 NAVAJAS M, 2000, EXP APPL ACAROL-a 102 5.67
## 33 NADLER SA, 2011, PARASITOLOGY 99 14.14
## 34 MARTIN FN, 2000, MYCOLOGIA 95 5.28
## 35 OUDEMANS P, 1991, MYCOL RES 91 3.37
## 36 MIRELMAN D, 1986, EXP PARASITOL 91 2.84
## 37 MATTIUCCI S, 2002, SYST PARASITOL 88 5.50
## 38 MATTIUCCI S, 2004, J FISH BIOL 83 5.93
## 39 GELTER HP, 1992, BEHAV ECOL SOCIOBIOL 82 3.15
## 40 SUZUKI T, 2005, CURR MED CHEM 78 6.00
## 41 BANDI C, 1995, PARASITOLOGY 78 3.39
## 42 LITTLE TJ, 1999, J ANIM ECOL 77 4.05
## 43 DUNCAN AB, 2007, EVOLUTION 76 6.91
## 44 CARMONA JA, 1997, GENETICS 76 3.62
## 45 JAMES TY, 1999, EVOLUTION 75 3.95
## 46 MELONI BP, 1995, J PARASITOL 73 3.17
## 47 GONZALEZRUIZ A, 1994, J CLIN PATHOL 73 3.04
## 48 MOPPER S, 2000, ECOLOGY 72 4.00
## 49 DELMOTTE F, 1999, HEREDITY 72 3.79
## 50 HAQUE R, 1993, J INFECT DIS 71 2.84
## 51 FERGUSON DJP, 2004, INT J PARASIT 69 4.93
## 52 SOLTIS DE, 1991, AM J BOT 68 2.52
## 53 TACHIBANA H, 1991, J CLIN MICROBIOL 67 2.48
## 54 SARGEAUNT PG, 1980, TRANS ROY SOC TROP MED HYG 67 1.76
## 55 POZIO E, 2002, INT J PARASIT 66 4.12
## 56 GARDNER JPA, 1994, ARCH HYDROBIOL 65 2.71
## 57 PATERSON AM, 2000, SYST BIOL 64 3.56
## 58 MATTIUCCI S, 2014, J PARASITOL 62 15.50
## 59 NAGANO I, 1999, INT J PARASIT 62 3.26
## 60 DARDE ML, 1998, J CLIN MICROBIOL 62 3.10
## 61 CODJIA V, 1993, ACTA TROP 62 2.48
## 62 ABAUNZA P, 2008, FISH RES 61 6.10
## 63 HAAG CR, 2005, GENETICS 61 4.69
## 64 ANTOLIN MF, 1999, RES POPUL ECOL 61 3.21
## 65 CLAY K, 1993, AGRIC ECOSYST ENVIRON 61 2.44
## 66 IRUSEN EM, 1992, CLIN INFECT DIS 60 2.31
## 67 BOLLINGER EK, 1991, BEHAV ECOL SOCIOBIOL 59 2.19
## 68 GREENSTONE MH, 2006, BULL ENTOMOL RES 58 4.83
## 69 FERGUSON DJP, 2002, INT J PARASIT 58 3.62
## 70 ASIEGBU FO, 1994, PHYSIOL MOL PLANT PATHOL 58 2.42
## 71 MATTIUCCI S, 2001, INT J PARASIT 57 3.35
## 72 MEAGHER S, 1999, EVOLUTION 57 3.00
## 73 DUFFY JE, 1996, EVOLUTION 57 2.59
## 74 DUFFY JE, 1993, MAR BIOL 57 2.28
## 75 FEDER JL, 1997, EVOLUTION 56 2.67
## 76 ANDREWS RH, 1992, PARASITOLOGY 56 2.15
## 77 QUINN TP, 1987, CAN J FISH AQUAT SCI 56 1.81
## 78 EMELIANOV I, 2003, J EVOL BIOL 55 3.67
## 79 MATTIUCCI S, 2008, FISH RES 54 5.40
## 80 MATTIUCCI S, 2009, SYST PARASITOL 53 5.89
## 81 VILLEMANT C, 2007, SYST ENTOMOL 53 4.82
## 82 CHILTON NB, 1992, INT J PARASIT-a 53 2.04
## 83 HUANG HW, 1998, AM J BOT 52 2.60
## 84 MOLBO D, 1996, PROC R SOC B-BIOL SCI 51 2.32
## 85 CHABOUDEZ P, 1995, OECOLOGIA 51 2.22
## 86 MATTIUCCI S, 2007, VET PARASITOL 50 4.55
## 87 GRANT WN, 1994, INT J PARASIT 50 2.08
## 88 TOMAVO S, 2001, INT J PARASIT 49 2.88
## 89 MILGROOM MG, 1995, PHYTOPATHOLOGY 49 2.13
## 90 ALS TD, 2002, ECOL ENTOMOL 48 3.00
## 91 CHEN W, 1992, PHYTOPATHOLOGY 48 1.85
## 92 JEROME CA, 2002, MOL ECOL 47 2.94
## 93 UESUGI R, 2002, J ECON ENTOMOL 46 2.88
## 94 HEDRICK PW, 1998, EVOLUTION 46 2.30
## 95 SCHULTZ TR, 1998, INSECT SOC 46 2.30
## 96 BRITTEN D, 1997, J CLIN MICROBIOL 46 2.19
## 97 NEVO E, 1994, HEREDITY 45 1.88
## 98 MITCHELL SE, 2004, ECOL LETT 44 3.14
## 99 WEEKS AR, 1995, EXP APPL ACAROL 44 1.91
## 100 BURCH DJ, 1991, J CLIN MICROBIOL 44 1.63
##
##
## Most Productive Countries (of corresponding authors)
##
## Country Articles Freq SCP MCP MCP_Ratio
## 1 USA 102 0.1735 83 19 0.1863
## 2 UNITED KINGDOM 49 0.0833 31 18 0.3673
## 3 ITALY 45 0.0765 27 18 0.4000
## 4 FRANCE 44 0.0748 30 14 0.3182
## 5 AUSTRALIA 42 0.0714 38 4 0.0952
## 6 JAPAN 41 0.0697 35 6 0.1463
## 7 SPAIN 31 0.0527 18 13 0.4194
## 8 GERMANY 19 0.0323 14 5 0.2632
## 9 BRAZIL 18 0.0306 14 4 0.2222
## 10 CANADA 18 0.0306 14 4 0.2222
## 11 MEXICO 13 0.0221 11 2 0.1538
## 12 SWITZERLAND 12 0.0204 8 4 0.3333
## 13 NEW ZEALAND 10 0.0170 10 0 0.0000
## 14 AUSTRIA 9 0.0153 7 2 0.2222
## 15 INDIA 9 0.0153 9 0 0.0000
## 16 SLOVAKIA 8 0.0136 0 8 1.0000
## 17 BELGIUM 7 0.0119 5 2 0.2857
## 18 CHINA 7 0.0119 6 1 0.1429
## 19 ISRAEL 7 0.0119 5 2 0.2857
## 20 KENYA 7 0.0119 4 3 0.4286
## 21 NETHERLANDS 7 0.0119 2 5 0.7143
## 22 SOUTH AFRICA 6 0.0102 5 1 0.1667
## 23 EGYPT 5 0.0085 3 2 0.4000
## 24 POLAND 5 0.0085 3 2 0.4000
## 25 SWEDEN 5 0.0085 5 0 0.0000
## 26 TURKEY 5 0.0085 5 0 0.0000
## 27 DENMARK 4 0.0068 2 2 0.5000
## 28 FINLAND 4 0.0068 2 2 0.5000
## 29 HUNGARY 4 0.0068 4 0 0.0000
## 30 THAILAND 4 0.0068 1 3 0.7500
## 31 VENEZUELA 4 0.0068 4 0 0.0000
## 32 ARGENTINA 3 0.0051 2 1 0.3333
## 33 CZECH REPUBLIC 3 0.0051 1 2 0.6667
## 34 SERBIA 3 0.0051 2 1 0.3333
## 35 TAIWAN 3 0.0051 0 3 1.0000
## 36 BANGLADESH 2 0.0034 1 1 0.5000
## 37 BULGARIA 2 0.0034 2 0 0.0000
## 38 CHILE 2 0.0034 0 2 1.0000
## 39 IRAN 2 0.0034 1 1 0.5000
## 40 IRELAND 2 0.0034 1 1 0.5000
## 41 KOREA 2 0.0034 2 0 0.0000
## 42 PORTUGAL 2 0.0034 0 2 1.0000
## 43 ROMANIA 2 0.0034 2 0 0.0000
## 44 BAHAMAS 1 0.0017 0 1 1.0000
## 45 COLOMBIA 1 0.0017 1 0 0.0000
## 46 CROATIA 1 0.0017 0 1 1.0000
## 47 ESTONIA 1 0.0017 1 0 0.0000
## 48 GEORGIA 1 0.0017 0 1 1.0000
## 49 MAURITANIA 1 0.0017 1 0 0.0000
## 50 RUSSIA 1 0.0017 0 1 1.0000
## 51 SLOVENIA 1 0.0017 1 0 0.0000
## 52 URUGUAY 1 0.0017 0 1 1.0000
##
##
## SCP: Single Country Publications
##
## MCP: Multiple Country Publications
##
##
## Total Citations per Country
##
## Country Total Citations Average Article Citations
## 1 USA 4452 43.65
## 2 UNITED KINGDOM 1769 36.10
## 3 FRANCE 1492 33.91
## 4 ITALY 1337 29.71
## 5 AUSTRALIA 1209 28.79
## 6 JAPAN 699 17.05
## 7 CANADA 545 30.28
## 8 SPAIN 481 15.52
## 9 SWITZERLAND 390 32.50
## 10 ISRAEL 375 53.57
## 11 GERMANY 355 18.68
## 12 PORTUGAL 338 169.00
## 13 NETHERLANDS 174 24.86
## 14 SWEDEN 166 33.20
## 15 BRAZIL 160 8.89
## 16 NEW ZEALAND 155 15.50
## 17 SOUTH AFRICA 142 23.67
## 18 MEXICO 140 10.77
## 19 DENMARK 127 31.75
## 20 URUGUAY 126 126.00
## 21 KENYA 115 16.43
## 22 AUSTRIA 96 10.67
## 23 BANGLADESH 77 38.50
## 24 FINLAND 77 19.25
## 25 SLOVAKIA 65 8.12
## 26 TURKEY 64 12.80
## 27 POLAND 54 10.80
## 28 INDIA 47 5.22
## 29 VENEZUELA 42 10.50
## 30 THAILAND 39 9.75
## 31 KOREA 34 17.00
## 32 IRELAND 31 15.50
## 33 TAIWAN 28 9.33
## 34 BELGIUM 27 3.86
## 35 ARGENTINA 24 8.00
## 36 IRAN 24 12.00
## 37 HUNGARY 23 5.75
## 38 CHINA 18 2.57
## 39 EGYPT 18 3.60
## 40 GEORGIA 17 17.00
## 41 CZECH REPUBLIC 16 5.33
## 42 SERBIA 13 4.33
## 43 ROMANIA 12 6.00
## 44 BAHAMAS 8 8.00
## 45 COLOMBIA 5 5.00
## 46 ESTONIA 5 5.00
## 47 CHILE 3 1.50
## 48 SLOVENIA 2 2.00
## 49 BULGARIA 1 0.50
## 50 CROATIA 1 1.00
## 51 MAURITANIA 1 1.00
## 52 RUSSIA 0 0.00
##
##
## Most Relevant Sources
##
## Sources Articles
## 1 INTERNATIONAL JOURNAL FOR PARASITOLOGY 31
## 2 PARASITOLOGY RESEARCH 24
## 3 JOURNAL OF PARASITOLOGY 23
## 4 PARASITOLOGY 17
## 5 EVOLUTION 13
## 6 JOURNAL OF CLINICAL MICROBIOLOGY 10
## 7 SYSTEMATIC PARASITOLOGY 10
## 8 ANNALS OF THE ENTOMOLOGICAL SOCIETY OF AMERICA 9
## 9 APPLIED ENTOMOLOGY AND ZOOLOGY 9
## 10 EXPERIMENTAL AND APPLIED ACAROLOGY 9
## 11 TRANSACTIONS OF THE ROYAL SOCIETY OF TROPICAL MEDICINE AND HYGIENE 9
## 12 VETERINARY PARASITOLOGY 9
## 13 BULLETIN OF ENTOMOLOGICAL RESEARCH 8
## 14 EXPERIMENTAL \\& APPLIED ACAROLOGY 8
## 15 MOLECULAR ECOLOGY 8
## 16 PHYTOPATHOLOGY 8
## 17 AMERICAN JOURNAL OF TROPICAL MEDICINE AND HYGIENE 7
## 18 PLANT DISEASE 7
## 19 BIOLOGICAL JOURNAL OF THE LINNEAN SOCIETY 6
## 20 EXPERIMENTAL PARASITOLOGY 6
## 21 HEREDITY 6
## 22 JOURNAL OF MEDICAL ENTOMOLOGY 6
## 23 JOURNAL OF NEMATOLOGY 6
## 24 NEMATOLOGY 6
## 25 PARASITE-JOURNAL DE LA SOCIETE FRANCAISE DE PARASITOLOGIE 6
## 26 ACTA TROPICA 5
## 27 AMERICAN JOURNAL OF BOTANY 5
## 28 HELMINTHOLOGIA 5
## 29 JOURNAL OF PROTOZOOLOGY 5
## 30 MYCOLOGICAL RESEARCH 5
## 31 ANNALS OF TROPICAL MEDICINE AND PARASITOLOGY 4
## 32 ARCHIVES OF MEDICAL RESEARCH 4
## 33 BIOCHEMICAL SYSTEMATICS AND ECOLOGY 4
## 34 CANADIAN JOURNAL OF ZOOLOGY-REVUE CANADIENNE DE ZOOLOGIE 4
## 35 ICOPA IX - 9TH INTERNATIONAL CONGRESS OF PARASITOLOGY 4
## 36 JOURNAL OF EUKARYOTIC MICROBIOLOGY 4
## 37 JOURNAL OF EVOLUTIONARY BIOLOGY 4
## 38 MOLECULAR AND BIOCHEMICAL PARASITOLOGY 4
## 39 PLANT PATHOLOGY 4
## 40 PLOS ONE 4
## 41 ACTA PARASITOLOGICA 3
## 42 BEHAVIORAL ECOLOGY AND SOCIOBIOLOGY 3
## 43 BIOLOGICAL CONTROL 3
## 44 COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY B-BIOCHEMISTRY \\& MOLECULAR BIOLOGY 3
## 45 ENTOMOLOGIA EXPERIMENTALIS ET APPLICATA 3
## 46 EUROPEAN JOURNAL OF ENTOMOLOGY 3
## 47 FISHERIES RESEARCH 3
## 48 INDIAN JOURNAL OF MEDICAL RESEARCH SECTION A-INFECTIOUS DISEASES 3
## 49 INFECTIOUS AGENTS AND DISEASE-REVIEWS ISSUES AND COMMENTARY 3
## 50 INSECTES SOCIAUX 3
## 51 INTERNATIONAL JOURNAL OF FOOD MICROBIOLOGY 3
## 52 JAPANESE JOURNAL OF APPLIED ENTOMOLOGY AND ZOOLOGY 3
## 53 JOURNAL OF INFECTIOUS DISEASES 3
## 54 JOURNAL OF NATURAL HISTORY 3
## 55 JOURNAL OF ZOOLOGICAL SYSTEMATICS AND EVOLUTIONARY RESEARCH 3
## 56 MYCOLOGIA 3
## 57 NEMATROPICA 3
## 58 PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES 3
## 59 ARCHIVOS DE INVESTIGACION MEDICA 2
## 60 AUSTRALIAN JOURNAL OF ZOOLOGY 2
## 61 BELGIAN JOURNAL OF ZOOLOGY 2
## 62 BIOCHEMICAL GENETICS 2
## 63 CANADIAN JOURNAL OF BOTANY-REVUE CANADIENNE DE BOTANIQUE 2
## 64 CLINICAL MICROBIOLOGY REVIEWS 2
## 65 ECOLOGICAL ENTOMOLOGY 2
## 66 ENVIRONMENTAL BIOLOGY OF FISHES 2
## 67 EUPHYTICA 2
## 68 GENETICA 2
## 69 GENETICS 2
## 70 IN VITRO CELLULAR \\& DEVELOPMENTAL BIOLOGY-ANIMAL 2
## 71 INFECTION AND IMMUNITY 2
## 72 INFECTION GENETICS AND EVOLUTION 2
## 73 INTERNATIONAL JOURNAL OF ACAROLOGY 2
## 74 INVESTIGACION CLINICA 2
## 75 JOURNAL OF FISH BIOLOGY 2
## 76 JOURNAL OF GENERAL MICROBIOLOGY 2
## 77 JOURNAL OF HELMINTHOLOGY 2
## 78 JOURNAL OF HEREDITY 2
## 79 JOURNAL OF MAMMALOGY 2
## 80 JOURNAL OF ZOO AND WILDLIFE MEDICINE 2
## 81 MARINE BIOLOGY 2
## 82 MARINE ECOLOGY PROGRESS SERIES 2
## 83 MEMORIAS DO INSTITUTO OSWALDO CRUZ 2
## 84 PARASITOLOGY INTERNATIONAL 2
## 85 PHYSIOLOGICAL AND MOLECULAR PLANT PATHOLOGY 2
## 86 PROCEEDINGS OF THE ENTOMOLOGICAL SOCIETY OF WASHINGTON 2
## 87 RUSSIAN JOURNAL OF NEMATOLOGY 2
## 88 THEORETICAL AND APPLIED GENETICS 2
## 89 16TH INTERNATIONAL SCIENTIFIC COLLOQUIUM ON COFFEE VOLS I \\& II 1
## 90 2010 4TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING (ICBBE 2010) 1
## 91 ACTA CRYSTALLOGRAPHICA SECTION D-BIOLOGICAL CRYSTALLOGRAPHY 1
## 92 ACTA PROTOZOOLOGICA 1
## 93 ACTA THERIOLOGICA 1
## 94 ACTA ZOOLOGICA BULGARICA 1
## 95 AFRICAN ENTOMOLOGY 1
## 96 AFRICAN JOURNAL OF BIOTECHNOLOGY 1
## 97 AFRICAN ZOOLOGY 1
## 98 AGRICULTURE ECOSYSTEMS \\& ENVIRONMENT 1
## 99 AMERICAN JOURNAL OF ENOLOGY AND VITICULTURE 1
## 100 AMERICAN MIDLAND NATURALIST 1
##
##
## Most Relevant Keywords
##
## Author Keywords (DE) Articles Keywords-Plus (ID) Articles
## 1 ALLOZYMES 60 DIFFERENTIATION 71
## 2 ALLOZYME 27 POPULATIONS 56
## 3 ELECTROPHORESIS 26 IDENTIFICATION 53
## 4 TAXONOMY 19 EVOLUTION 47
## 5 ENTAMOEBA HISTOLYTICA 18 DNA 31
## 6 GENE FLOW 16 ELECTROPHORESIS 29
## 7 RESISTANCE 15 ISOENZYME PATTERNS 27
## 8 ALLOZYME ELECTROPHORESIS 13 DIVERSITY 26
## 9 ENTAMOEBA DISPAR 13 PARASITES 25
## 10 PHYLOGENY 13 ALLOZYME 24
## 11 SPECIATION 13 NATURAL-POPULATIONS 23
## 12 ESTERASE 12 VARIABILITY 23
## 13 MORPHOLOGY 12 AMEBIASIS 21
## 14 POPULATION GENETICS 12 HOST 21
## 15 ISOENZYME 11 RESISTANCE 19
## 16 ISOZYMES 11 STRAINS 19
## 17 SYSTEMATICS 11 VIRULENCE 19
## 18 TETRANYCHUS URTICAE 11 POPULATION-STRUCTURE 18
## 19 GENETIC DIVERSITY 10 MITOCHONDRIAL-DNA 17
## 20 GENETIC STRUCTURE 10 POPULATION 17
## 21 GENETIC VARIATION 10 INFECTION 16
## 22 PARASITE 10 SEQUENCES 16
## 23 PCR 10 SYSTEMATICS 15
## 24 POPULATION STRUCTURE 10 ASCARIDOIDEA 14
## 25 DIAGNOSIS 9 DISTANCE 14
## 26 MITOCHONDRIAL DNA 9 ENTAMOEBA-HISTOLYTICA 14
## 27 MORPHOMETRICS 9 FLOW 14
## 28 POLYMORPHISM 9 GENETIC-VARIATION 14
## 29 AMEBIASIS 8 POLYMORPHISM 14
## 30 GENETICS 8 MARKERS 13
## 31 HYMENOPTERA 8 PATTERNS 13
## 32 RAPD 8 POLYMERASE CHAIN-REACTION 13
## 33 GENETIC VARIABILITY 7 POPULATION-GENETICS 13
## 34 HETEROZYGOSITY 7 SPECIATION 13
## 35 HYBRIDIZATION 7 ZYMODEMES 13
## 36 ISOENZYMES 7 BRAZIL 12
## 37 LOCAL ADAPTATION 7 DIAGNOSIS 12
## 38 POPULATION 7 GENETIC DIVERSITY 12
## 39 RFLP 7 GENETIC-STRUCTURE 12
## 40 TRICHINELLA 7 RIBOSOMAL DNA 12
## 41 VARIATION 7 COEVOLUTION 11
## 42 ZYMODEME 7 DISPAR 11
## 43 ENTAMOEBA-HISTOLYTICA 6 EXPRESSION 11
## 44 GENETIC 6 PARASITE 11
## 45 ISOZYME 6 DIVERGENCE 10
## 46 MOLECULAR 6 GENUS 10
## 47 PARASITES 6 HETEROZYGOSITY 10
## 48 SIBLING SPECIES 6 HOMOSEXUAL MEN 10
## 49 ACARI 5 HYMENOPTERA 10
## 50 BIOLOGICAL CONTROL 5 ORIGIN 10
## 51 EPIDEMIOLOGY 5 RHAGOLETIS-POMONELLA 10
## 52 GENETIC DIFFERENTIATION 5 SELECTION 10
## 53 GIARDIA 5 ACARI 9
## 54 IDENTIFICATION 5 AXENIC CULTIVATION 9
## 55 MELOIDOGYNE 5 CRYPHONECTRIA-PARASITICA 9
## 56 MICROSATELLITES 5 DNA PROBES 9
## 57 PARASITISM 5 DROSOPHILA-MELANOGASTER 9
## 58 PARTHENOGENESIS 5 FISH 9
## 59 PATHOGENICITY 5 ISOZYME 9
## 60 ROOT-KNOT NEMATODE 5 NEMATODA 9
## 61 SELECTION 5 PARASITISM 9
## 62 SEXUAL REPRODUCTION 5 PLANT-PARASITIC NEMATODES 9
## 63 SPECIES 5 RIBOSOMAL-RNA 9
## 64 TOXOPLASMA GONDII 5 SURFACE-ANTIGEN 9
## 65 ALLOZYME VARIATION 4 UNITED-STATES 9
## 66 CESTODA 4 BIOLOGICAL-CONTROL 8
## 67 CHARACTERIZATION 4 BIOLOGY 8
## 68 COEVOLUTION 4 ELECTROPHORETIC ISOENZYME PATTERNS 8
## 69 CONSERVATION 4 ENZYME PHENOTYPES 8
## 70 COSPECIATION 4 ENZYMES 8
## 71 CRYPTIC SPECIES 4 GENE FLOW 8
## 72 DIFFERENTIATION 4 PCR 8
## 73 EVOLUTION 4 POLYMERASE-CHAIN-REACTION 8
## 74 GENETIC DISTANCE 4 POLYMORPHISMS 8
## 75 HOST RACES 4 SEXUAL REPRODUCTION 8
## 76 HOST RANGE 4 STRAIN 8
## 77 HOST SPECIFICITY 4 ALLOZYME ANALYSIS 7
## 78 INSECTA 4 ALLOZYME DATA 7
## 79 INTRASPECIFIC VARIATION 4 AMPLIFICATION 7
## 80 IXODES RICINUS 4 ASCARIDIDA 7
## 81 MALATE DEHYDROGENASE 4 ATLANTIC 7
## 82 MTDNA 4 DIPTERA 7
## 83 NEMATODE 4 GENETIC DIFFERENTIATION 7
## 84 POPULATION GENETIC STRUCTURE 4 GROWTH 7
## 85 RED QUEEN 4 INFECTIONS 7
## 86 REPRODUCTIVE ISOLATION 4 INSECTS 7
## 87 SHEEP 4 ISOENZYME ANALYSIS 7
## 88 AFLP 3 IXODIDAE 7
## 89 COLONIZATION 3 JAPAN 7
## 90 DIAGNOSTICS 3 LECTIN 7
## 91 DIPTERA 3 LEPIDOPTERA 7
## 92 DISTRIBUTION 3 LOCAL ADAPTATION 7
## 93 DIVERSITY 3 MONOCLONAL-ANTIBODIES 7
## 94 E. HISTOLYTICA 3 NONPATHOGENIC ENTAMOEBA-HISTOLYTICA 7
## 95 ENZYME ELECTROPHORESIS 3 PHYLOGENETIC-RELATIONSHIPS 7
## 96 FISH 3 PLANT 7
## 97 GENETIC DIVERGENCE 3 PROTEIN 7
## 98 GENETIC POLYMORPHISM 3 RAPD MARKERS 7
## 99 GEOGRAPHIC 3 SIMPLEX COMPLEX ASCARIDIDA 7
## 100 GEOGRAPHICAL VARIATION 3 SPIRALIS 7
#bar chart of top 10 countries
df_count_nomed<-data.frame(Country=as.character(citations_nomed_ana.sum$MostProdCountries$`Country `),Article_count=as.integer(citations_nomed_ana.sum$MostProdCountries$Articles)) %>% slice(.,1:10)
ggplot(df_count_nomed, aes(Country, Article_count)) +
geom_bar(stat = "identity",fill=brewer.pal(10, "Spectral")) +
coord_flip() +
theme_bw()
#with everyone else category
vec<-as.data.frame(citations_nomed_ana$Countries,stringsAsFactors = F) %>% filter(!Tab %in% trimws(as.character(df_count_nomed$Country),which = c("both", "left", "right"))) %>% select(.,Freq) %>% sum()
vec2<-data.frame(Country='OTHER',Article_count=as.integer(vec))
df_count_nomed<-rbind(df_count_nomed,vec2)
ggplot(df_count_nomed, aes(Country, Article_count)) +
geom_bar(stat = "identity",fill=brewer.pal(11, "Spectral")) +
coord_flip() +
theme_bw()
#write it out as a table
write.table(citations_nomed_ana.sum$MostProdCountries,'../nonmedicalparasites/TopProducingCountriesForAllozymeNonMediacalParasiteSearch',row.names=F,quote=F,sep='\t')
#to see when XX % of papers were published
table<-citations_nomed_ana.sum$AnnualProduction %>% mutate(cumsum=cumsum(Articles),cumper=cumsum(Articles)/sum(Articles)*100)
table
## Year Articles cumsum cumper
## 1 1966 1 1 0.1628664
## 2 1975 2 3 0.4885993
## 3 1978 3 6 0.9771987
## 4 1979 1 7 1.1400651
## 5 1980 2 9 1.4657980
## 6 1982 2 11 1.7915309
## 7 1983 2 13 2.1172638
## 8 1984 2 15 2.4429967
## 9 1985 1 16 2.6058632
## 10 1986 4 20 3.2573290
## 11 1987 6 26 4.2345277
## 12 1988 1 27 4.3973941
## 13 1989 4 31 5.0488599
## 14 1990 3 34 5.5374593
## 15 1991 30 64 10.4234528
## 16 1992 36 100 16.2866450
## 17 1993 32 132 21.4983713
## 18 1994 27 159 25.8957655
## 19 1995 24 183 29.8045603
## 20 1996 25 208 33.8762215
## 21 1997 42 250 40.7166124
## 22 1998 43 293 47.7198697
## 23 1999 32 325 52.9315961
## 24 2000 31 356 57.9804560
## 25 2001 23 379 61.7263844
## 26 2002 27 406 66.1237785
## 27 2003 27 433 70.5211726
## 28 2004 29 462 75.2442997
## 29 2005 15 477 77.6872964
## 30 2006 15 492 80.1302932
## 31 2007 17 509 82.8990228
## 32 2008 14 523 85.1791531
## 33 2009 19 542 88.2736156
## 34 2010 8 550 89.5765472
## 35 2011 7 557 90.7166124
## 36 2012 7 564 91.8566775
## 37 2013 9 573 93.3224756
## 38 2014 4 577 93.9739414
## 39 2015 13 590 96.0912052
## 40 2016 10 600 97.7198697
## 41 2017 9 609 99.1856678
## 42 2018 5 614 100.0000000
write.table(table,'../nonmedicalparasites/ProductionPerYearForNonMedicalParasites',row.names=F,quote=F,sep='\t')
ggplot(citations_nomed_ana.sum$AnnualProduction, aes(`Year `,Articles, group=1)) +
geom_point( size = 3,colour='red') +
geom_line() +
labs(title="Allozymes",x='Year', y='Article Number', fill="Subset") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
#create some splines to smooth the curve
spline_int <- as.data.frame(spline(citations_nomed_ana.sum$AnnualProductio$`Year `, citations_nomed_ana.sum$AnnualProduction$Articles))
ggplot(citations_nomed_ana.sum$AnnualProduction) +
geom_point(aes(citations_nomed_ana.sum$AnnualProduction$`Year `,citations_nomed_ana.sum$AnnualProduction$Articles), size = 1) +
geom_line(data = spline_int, aes(x,y)) +
geom_area(data = spline_int, aes(x,y,fill='red'),alpha=0.6) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(title="Allozymes",x='Year', y='Article Number', fill="Subset") +
scale_fill_manual(labels = "Parasites", values = alpha("red",.6))
#words cloud
forwordcloud_nomed<-as.data.frame(cbind(as.character(trimws(citations_nomed_ana.sum$MostRelKeywords$`Author Keywords (DE) `, which = c("both", "left", "right"))),citations_nomed_ana.sum$MostRelKeywords[2]),stringsAsFactors=FALSE)
colnames(forwordcloud_nomed)<-c('keyword','count_papers')
forwordcloud_nomed<- forwordcloud_nomed %>% filter(!grepl('allozyme|electrophoresis|isoenzyme|isozyme|rapd|carbonic anhydrase|aflp|creatine kinase|protein kinase|alkaline phosphatase|cytochrome P450|glutathione S-transferase|alcohol dehydrogenase|lactate dehydrogenase|catalase|aldehyde dehydrogenase|hexokinase|peroxidase|5 alpha-reductase',keyword,ignore.case = TRUE))
#create corpus
forwordcloud_nomed<-forwordcloud_nomed %>% mutate(fixkeyword=sub("GENETICS", "GENETIC", keyword))
forwordcloud_nomed.Corpus<-Corpus(VectorSource(forwordcloud_nomed[rep(row.names(forwordcloud_nomed), forwordcloud_nomed$count_papers), 3]))
wordcloud(forwordcloud_nomed.Corpus,colors=brewer.pal(8, "Dark2"),max.words=30,scale=c(2.2,.6))
#whole phrases
wordcloud(tolower(forwordcloud_nomed$keyword),as.numeric(forwordcloud_nomed$count_papers), colors=brewer.pal(8, "Set1"),max.words=30,scale=c(1.2,.6))
wordcloud(tolower(forwordcloud_nomed$keyword),as.numeric(forwordcloud_nomed$count_papers), colors=brewer.pal(8, "Dark2"),vfont=c("script","bold"),max.words=30,rot.per=0,scale=c(1.5,.6))
wordcloud(tolower(forwordcloud_nomed$keyword),as.numeric(forwordcloud_nomed$count_papers), colors=brewer.pal(8, "Dark2"),family = "mono",font = 2,max.words=30,scale=c(1.3,.6))
Here Im just joining it on the to dmerged file that we used earlier. It already has the parasite + broadscale search we just need to add non-medical parasites.
df_nonmed<-citations_nomed_ana.sum$AnnualProduction
colnames(df_nonmed) <- c("Year", "ArticlesParasite_nonmed")
df_nonmed<-df_nonmed %>% arrange(.,Year) %>% mutate(PercentPerYearParasites_nonmed=cumsum(ArticlesParasite_nonmed)/sum(ArticlesParasite_nonmed)*100)
df_nonmed$Year <- as.character(df_nonmed$Year)
head(dmerged)
## # A tibble: 6 x 5
## Year ArticlesGeneral PercentPerYearGeneral ArticlesParasite PercentPerYearParasites
## <int> <int> <dbl> <dbl> <dbl>
## 1 1960 1 0.00259 0 0
## 2 1962 9 0.0259 0 0
## 3 1963 16 0.0674 0 0
## 4 1964 30 0.145 0 0
## 5 1965 31 0.225 0 0
## 6 1966 45 0.342 1 0.0664
dmerged$Year <- as.character(dmerged$Year)
dmerged3<-full_join(dmerged,df_nonmed,by='Year',all=TRUE)
dmerged3[is.na(dmerged3)] <- 0
dmerged3$Year<-as.integer(dmerged3$Year)
#lets drop 2019 because its a bit of a dumb point
dmerged3 %>% select(.,ArticlesGeneral,ArticlesParasite,ArticlesParasite_nonmed,Year) %>% filter(.,Year!=2019) %>% tidyr::gather("id", "value", 1:3) %>% ggplot(aes(Year, value)) +
geom_point(aes(colour = factor(id)),size = 1) +
geom_line(aes(colour = factor(id))) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(title="Allozymes",x='Year', y=expression(sqrt(italic('Article Number'))), fill="Subset",color = "Article Type\n") +
scale_y_continuous(trans='sqrt')
#for splines
dmerged3<-dmerged3 %>% filter(.,Year!=2019)
spline_int <- as.data.frame(spline(dmerged3$Year, dmerged3$ArticlesParasite))
spline_int2 <- as.data.frame(spline(dmerged3$Year, dmerged3$ArticlesGeneral))
spline_int3 <- as.data.frame(spline(dmerged3$Year, dmerged3$ArticlesParasite_nonmed))
spline_int$y[spline_int$y < 0] <- 0
spline_int3$y[spline_int3$y < 0] <- 0
ggplot(dmerged3) +
geom_point(aes(dmerged3$Year,dmerged3$ArticlesGeneral), col='red',size = 1) +
geom_point(aes(dmerged3$Year,dmerged3$ArticlesParasite), col='blue',size = 1) +
geom_point(aes(dmerged3$Year,dmerged3$ArticlesParasite_nonmed), col='green',size = 1) +
geom_line(data = spline_int2, aes(x,y)) +
geom_area(data = spline_int2, aes(x,y,fill='blue')) +
geom_line(data = spline_int, aes(x,y)) +
geom_area(data = spline_int, aes(x,y,fill='red')) +
geom_line(data = spline_int3, aes(x,y)) +
geom_area(data = spline_int3, aes(x,y,fill='green')) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(title="Allozymes",x='Year', y=expression(sqrt(italic('Article Number'))), fill="Subset") +
scale_fill_manual(labels = c("Everyone", "Parasites not medical","Parasites"), values = alpha(c("red", "green","blue"),.6)) +
scale_y_continuous(trans='sqrt')
You should have a play around with the data and see what you can see. I have just given you some broad ideas-explore the data in your own way…. The files from the 500 at a time download have a lot of other metadata that you could explore
#depending on how the data comes down we can look at other things as well
head(citations_df$TC)
## [1] 0 0 0 1 0 0
dmerged.violin<-data.frame(citationcount=citations_df$TC,type='Parasite Search')
dmerged.violin<-rbind(dmerged.violin,data.frame(citationcount=citations_nonmed_df$TC,type='Parasite no medical'))
ggplot(dmerged.violin,aes(type,citationcount)) +
geom_violin(aes(fill = factor(type))) +
scale_y_continuous(trans='sqrt')+
labs(title="Citation count per article",x='Search Group', y=expression(sqrt(italic('Citation Count'))), fill="Subset") +
theme_bw()+
scale_fill_manual(values = alpha(c("red", "blue"),.6))
#could look at citation over the years
para_citationperyear<-citations_df %>% select(.,PY,TC) %>% group_by(PY) %>% tally(TC)
nomed_citationperyear<-citations_nonmed_df %>% select(.,PY,TC) %>% group_by(PY) %>% tally(TC)
colnames(para_citationperyear) <- c("Year", "CitationParasites")
colnames(nomed_citationperyear) <- c("Year", "CitationParasitesNoMedical")
dmerged.citationPY<-full_join(para_citationperyear,nomed_citationperyear,by='Year')
dmerged.citationPY[is.na(dmerged.citationPY)] <- 0
head(dmerged.citationPY)
## # A tibble: 6 x 3
## Year CitationParasites CitationParasitesNoMedical
## <dbl> <dbl> <dbl>
## 1 1966 4 4
## 2 1968 0 0
## 3 1973 0 0
## 4 1974 29 0
## 5 1975 13 13
## 6 1977 258 0
dmerged.citationPY %>% filter(.,Year!=2019) %>% tidyr::gather("id", "value", 2:3) %>% ggplot(aes(Year, value)) +
geom_point(aes(colour = factor(id)),size = 1) +
geom_line(aes(colour = factor(id))) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(title="Allozymes",x='Year', y=expression(sqrt(italic('Citation Count'))), fill="Subset",color = "Article Type\n") +
scale_y_continuous(trans='sqrt')